Advertisement

VisDrone-VDT2018: The Vision Meets Drone Video Detection and Tracking Challenge Results

  • Pengfei ZhuEmail author
  • Longyin Wen
  • Dawei Du
  • Xiao Bian
  • Haibin Ling
  • Qinghua Hu
  • Haotian Wu
  • Qinqin Nie
  • Hao Cheng
  • Chenfeng Liu
  • Xiaoyu Liu
  • Wenya Ma
  • Lianjie Wang
  • Arne Schumann
  • Dan Wang
  • Diego Ortego
  • Elena Luna
  • Emmanouil Michail
  • Erik Bochinski
  • Feng Ni
  • Filiz Bunyak
  • Gege Zhang
  • Guna Seetharaman
  • Guorong Li
  • Hongyang Yu
  • Ioannis Kompatsiaris
  • Jianfei Zhao
  • Jie Gao
  • José M. Martínez
  • Juan C. San Miguel
  • Kannappan Palaniappan
  • Konstantinos Avgerinakis
  • Lars Sommer
  • Martin Lauer
  • Mengkun Liu
  • Noor M. Al-Shakarji
  • Oliver Acatay
  • Panagiotis Giannakeris
  • Qijie Zhao
  • Qinghua Ma
  • Qingming Huang
  • Stefanos Vrochidis
  • Thomas Sikora
  • Tobias Senst
  • Wei Song
  • Wei Tian
  • Wenhua Zhang
  • Yanyun Zhao
  • Yidong Bai
  • Yinan Wu
  • Yongtao Wang
  • Yuxuan Li
  • Zhaoliang Pi
  • Zhiming Ma
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11133)

Abstract

Drones equipped with cameras have been fast deployed to a wide range of applications, such as agriculture, aerial photography, fast delivery, and surveillance. As the core steps in those applications, video object detection and tracking attracts much research effort in recent years. However, the current video object detection and tracking algorithms are not usually optimal for dealing with video sequences captured by drones, due to various challenges, such as viewpoint change and scales. To promote and track the development of the detection and tracking algorithms with drones, we organized the Vision Meets Drone Video Detection and Tracking (VisDrone-VDT2018) challenge, which is a subtrack of the Vision Meets Drone 2018 challenge workshop in conjunctiohe 15th European Conference on Computer Vision (ECCV 2018). Specifically, this workshop challenge consists of two tasks, (1) video object detection, and (2) multi-object tracking. We present a large-scale video object detection and tracking dataset, which consists of 79 video clips with about 1.5 million annotated bounding boxes in 33, 366 frames. We also provide rich annotations, including object categories, occlusion, and truncation ratios for better data usage. Being the largest such dataset ever published, the challenge enables extensive evaluation, investigation and tracking the progress of object detection and tracking algorithms on the drone platform. We present the evaluation protocol of the VisDrone-VDT2018 challenge and the results of the algorithms on the benchmark dataset, which are publicly available on the challenge website: http://www.aiskyeye.com/. We hope the challenge largely boost the research and development in related fields.

Keywords

Drone Benchmark Object detection in videos Multi-object tracking 

Notes

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61502332 and Grant 61732011, in part by Natural Science Foundation of Tianjin under Grant 17JCZDJC30800, in part by US National Science Foundation under Grant IIS-1407156 and Grant IIS-1350521, and in part by Beijing Seetatech Technology Co., Ltd and GE Global Research.

References

  1. 1.
    Mot17 challenge. https://motchallenge.net/
  2. 2.
    Al-Shakarji, N.M., Seetharaman, G., Bunyak, F., Palaniappan, K.: Robust multi-object tracking with semantic color correlation. In: IEEE International Conference on Advanced Video and Signal-Based Surveillance, pp. 1–7 (2017)Google Scholar
  3. 3.
    Bae, S.H., Yoon, K.: Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern (2014)Google Scholar
  4. 4.
    Barekatain, M., et al.: Okutama-action: an aerial view video dataset for concurrent human action detection. In: Workshops in Conjunction with the IEEE Conference on Computer Vision and Pattern (2017)Google Scholar
  5. 5.
    Bertasius, G., Torresani, L., Shi, J.: Object detection in video with spatiotemporal sampling networks. CoRR abs/1803.05549 (2018). http://arxiv.org/abs/1803.05549CrossRefGoogle Scholar
  6. 6.
    Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: IEEE International Conference on Advanced Video and Signal-Based Surveillance, pp. 1–6 (2017)Google Scholar
  7. 7.
    Bochinski, E., Senst, T., Sikora, T.: Extending IOU based multi-object tracking by visual information. In: AVSS. IEEE (2018)Google Scholar
  8. 8.
    Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS - improving object detection with one line of code. In: Proceedings of the IEEE International ConferenceGoogle Scholar
  9. 9.
    Choi, W.: Near-online multi-target tracking with aggregated local flow descriptor. In: Proceedings of the IEEE International Conference Computer Vision, pp. 3029–3037 (2015)Google Scholar
  10. 10.
    Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: Proceedings of IEEE Conference on Computer Vision and PatternGoogle Scholar
  11. 11.
    Dicle, C., Camps, O.I., Sznaier, M.: The way they move: tracking multiple targets with similar appearance. In: Proceedings of the IEEE International ConferenceGoogle Scholar
  12. 12.
    Du, D., et al.: The unmanned aerial vehicle benchmark: object detection and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 375–391. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01249-6_23CrossRefGoogle Scholar
  13. 13.
    Feichtenhofer, C., Pinz, A., Zisserman, A.: Detect to track and track to detect. In: Proceedings of the IEEE International Conference Vision, pp. 3057–3065 (2017)Google Scholar
  14. 14.
    Galteri, L., Seidenari, L., Bertini, M., Bimbo, A.D.: Spatio-temporal closed-loop object detection. IEEE Trans. Image Process. 26(3), 1253–1263 (2017)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Geiger, A., Lauer, M., Wojek, C., Stiller, C., Urtasun, R.: 3D traffic scene understanding from movable platforms. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 1012–1025 (2014)CrossRefGoogle Scholar
  16. 16.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of IEEE Conference on Computer Vision and PatternGoogle Scholar
  17. 17.
    Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern (2014)Google Scholar
  18. 18.
    Han, W., et al.: Seq-NMS for video object detection. CoRR abs/1602.08465 (2016)Google Scholar
  19. 19.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)CrossRefGoogle Scholar
  20. 20.
    Hsieh, M., Lin, Y., Hsu, W.H.: Drone-based object counting by spatially regularized regional proposal network. In: ICCV (2017)Google Scholar
  21. 21.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. CoRR abs/1709.01507 (2017). http://arxiv.org/abs/1709.01507
  22. 22.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Forward-backward error: automatic detection of tracking failures. In: ICPR, pp. 2756–2759 (2010)Google Scholar
  23. 23.
    Kalogeiton, V., Ferrari, V., Schmid, C.: Analysing domain shift factors between videos and images for object detection. TPAMI 38(11), 2327–2334 (2016)CrossRefGoogle Scholar
  24. 24.
    Kang, K., et al.: Object detection in videos with tubelet proposal networks. In: Proceedings of IEEE Conference on Computer Vision and PatternGoogle Scholar
  25. 25.
    Leal-Taixé, L., Milan, A., Reid, I.D., Roth, S., Schindler, K.: MOTChallenge 2015: towards a benchmark for multi-target tracking. CoRR abs/1504.01942 (2015)Google Scholar
  26. 26.
    Li, B., Wu, T., Zhu, S.-C.: Integrating context and occlusion for car detection by hierarchical And-Or model. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 652–667. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_42CrossRefGoogle Scholar
  27. 27.
    Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International ConferenceGoogle Scholar
  28. 28.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  29. 29.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  30. 30.
    Lyu, S.L.S., et al.: UA-DETRAC 2017: report of AVSS2017 & IWT4S challenge on advanced traffic monitoring. In: AVSS, pp. 1–7 (2017)Google Scholar
  31. 31.
    Milan, A., Leal-Taixé, L., Reid, I.D., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. CoRR abs/1603.00831 (2016) arXiv preprint arXiv:1603.00831
  32. 32.
    Milan, A., Rezatofighi, S.H., Dick, A.R., Reid, I.D., Schindler, K.: Online multi-target tracking using recurrent neural networks. In: Association for the Advancement of Artificial Intelligence, pp. 4225–4232 (2017)Google Scholar
  33. 33.
    Milan, A., Roth, S., Schindler, K.: Continuous energy minimization for multitarget tracking. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 58–72 (2014)CrossRefGoogle Scholar
  34. 34.
    Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_27CrossRefGoogle Scholar
  35. 35.
    Park, E., Liu, W., Russakovsky, O., Deng, J., Li, F.F., Berg, A.: Large Scale Visual Recognition Challenge 2017. http://image-net.org/challenges/LSVRC/2017
  36. 36.
    Pirsiavash, H., Ramanan, D., Fowlkes, C.C.: Globally-optimal greedy algorithms for tracking a variable number of objects. In: Proceedings of IEEE Conference on Computer Vision and PatternGoogle Scholar
  37. 37.
    Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: CVPR, pp. 3282–3289 (2012)Google Scholar
  38. 38.
    Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. CoRR abs/1804.02767 (2018). http://arxiv.org/abs/1804.02767
  39. 39.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
  40. 40.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRefGoogle Scholar
  41. 41.
    Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 17–35. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-48881-3_2CrossRefGoogle Scholar
  42. 42.
    Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: human trajectory understanding in crowded scenes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 549–565. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_33CrossRefGoogle Scholar
  43. 43.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  44. 44.
    Sadeghian, A., Alahi, A., Savarese, S.: Tracking the untrackable: learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE International ConferenceGoogle Scholar
  45. 45.
    Son, J., Baek, M., Cho, M., Han, B.: Multi-object tracking with quadruplet convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and PatternGoogle Scholar
  46. 46.
    Tang, S., Andriluka, M., Andres, B., Schiele, B.: Multiple people tracking by lifted multicut and person re-identification. In: Proceedings of IEEE Conference on Computer Vision and Pattern (2017)Google Scholar
  47. 47.
    Tian, W., Lauer, M.: Fast cyclist detection by cascaded detector and geometric constraint. In: IEEE International Conference on Intelligent Transportation Systems, pp. 1286–1291 (2015)Google Scholar
  48. 48.
    Tian, W., Lauer, M.: Joint tracking with event grouping and temporal constraints. In: IEEE International Conference on Advanced Video and Signal-Based Surveillance, pp. 1–5 (2017)Google Scholar
  49. 49.
    Tian, Y., Luo, P., Wang, X., Tang, X.: Pedestrian detection aided by deep learning semantic tasks. In: CVPR, pp. 5079–5087 (2015)Google Scholar
  50. 50.
    Valmadre, J., Bertinetto, L., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern, pp. 5000–5008 (2017)Google Scholar
  51. 51.
    Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. CoRR abs/1511.04136 (2015)Google Scholar
  52. 52.
    Wen, L., Lei, Z., Lyu, S., Li, S.Z., Yang, M.: Exploiting hierarchical dense structures on hypergraphs for multi-object tracking. TPAMI 38(10), 1983–1996 (2016)CrossRefGoogle Scholar
  53. 53.
    Wen, L., Li, W., Yan, J., Lei, Z., Yi, D., Li, S.Z.: Multiple target tracking based on undirected hierarchical relation hypergraph. In: CVPR, pp. 1282–1289 (2014)Google Scholar
  54. 54.
    Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: Proceedings of IEEE International Conference on ImageGoogle Scholar
  55. 55.
    Xiang, Y., Alahi, A., Savarese, S.: Learning to track: Online multi-object tracking by decision making. In: Proceedings of the IEEE International ConferenceGoogle Scholar
  56. 56.
    Yoon, J.H., Lee, C., Yang, M., Yoon, K.: Online multi-object tracking via structural constraint event aggregation. In: Proceedings of IEEE Conference on Computer Vision and PatternGoogle Scholar
  57. 57.
    Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceedings of IEEE Conference on Computer Vision and PatternGoogle Scholar
  58. 58.
    Zhang, X., et al.: AlignedReID: surpassing human-level performance in person re-identification. CoRR abs/1711.08184 (2017). http://arxiv.org/abs/1711.08184
  59. 59.
    Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International ConferenceGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Pengfei Zhu
    • 1
    Email author
  • Longyin Wen
    • 2
  • Dawei Du
    • 3
  • Xiao Bian
    • 4
  • Haibin Ling
    • 5
  • Qinghua Hu
    • 1
  • Haotian Wu
    • 1
  • Qinqin Nie
    • 1
  • Hao Cheng
    • 1
  • Chenfeng Liu
    • 1
  • Xiaoyu Liu
    • 1
  • Wenya Ma
    • 1
  • Lianjie Wang
    • 1
  • Arne Schumann
    • 9
  • Dan Wang
    • 11
  • Diego Ortego
    • 16
  • Elena Luna
    • 16
  • Emmanouil Michail
    • 6
  • Erik Bochinski
    • 17
  • Feng Ni
    • 7
  • Filiz Bunyak
    • 14
  • Gege Zhang
    • 11
  • Guna Seetharaman
    • 15
  • Guorong Li
    • 13
  • Hongyang Yu
    • 12
  • Ioannis Kompatsiaris
    • 6
  • Jianfei Zhao
    • 8
  • Jie Gao
    • 11
  • José M. Martínez
    • 16
  • Juan C. San Miguel
    • 16
  • Kannappan Palaniappan
    • 14
  • Konstantinos Avgerinakis
    • 6
  • Lars Sommer
    • 9
    • 10
  • Martin Lauer
    • 10
  • Mengkun Liu
    • 11
  • Noor M. Al-Shakarji
    • 14
  • Oliver Acatay
    • 9
  • Panagiotis Giannakeris
    • 6
  • Qijie Zhao
    • 7
  • Qinghua Ma
    • 11
  • Qingming Huang
    • 13
  • Stefanos Vrochidis
    • 6
  • Thomas Sikora
    • 17
  • Tobias Senst
    • 17
  • Wei Song
    • 11
  • Wei Tian
    • 10
  • Wenhua Zhang
    • 11
  • Yanyun Zhao
    • 8
  • Yidong Bai
    • 11
  • Yinan Wu
    • 11
  • Yongtao Wang
    • 7
  • Yuxuan Li
    • 11
  • Zhaoliang Pi
    • 11
  • Zhiming Ma
    • 10
  1. 1.Tianjin UniversityTianjinChina
  2. 2.JD FinanceMountain ViewUSA
  3. 3.University at Albany, SUNYAlbanyUSA
  4. 4.GE Global ResearchNiskayunaUSA
  5. 5.Temple UniversityPhiladelphiaUSA
  6. 6.Centre for Research and Technology HellasThessalonikiGreece
  7. 7.Peking UniversityBeijingChina
  8. 8.Beijing University of Posts and TelecommunicationsBeijingChina
  9. 9.Fraunhofer IOSBKarlsruheGermany
  10. 10.Karlsruhe Institute of TechnologyKarlsruheGermany
  11. 11.Xidian UniversityXi’anChina
  12. 12.Harbin Institute of TechnologyHarbinChina
  13. 13.University of Chinese Academy of SciencesBeijingChina
  14. 14.University of Missouri-ColumbiaColumbiaUSA
  15. 15.U.S. Naval Research LaboratoryWashington, D.C.USA
  16. 16.Universidad Autónoma de MadridMadridSpain
  17. 17.Technische Universität BerlinBerlinGermany

Personalised recommendations