Skip to main content
Log in

Siamese object tracking for unmanned aerial vehicle: a review and comprehensive analysis

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Unmanned aerial vehicle (UAV)-based visual object tracking has enabled a wide range of applications and attracted increasing attention in the field of artificial intelligence (AI) because of its versatility and effectiveness. As an emerging force in the revolutionary trend of deep learning, Siamese networks shine in UAV-based object tracking with their promising balance of accuracy, robustness, and speed. Thanks to the development of embedded processors and the gradual optimization of deep neural networks, Siamese trackers receive extensive research and realize preliminary combinations with UAVs. However, due to the UAV’s limited onboard computational resources and the complex real-world circumstances, aerial tracking with Siamese networks still faces severe obstacles in many aspects. To further explore the deployment of Siamese networks in UAV-based tracking, this work presents a comprehensive review of leading-edge Siamese trackers, along with an exhaustive UAV-specific analysis based on the evaluation using a typical UAV onboard processor. Then, the onboard tests are conducted to validate the feasibility and efficacy of representative Siamese trackers in real-world UAV deployment. Furthermore, to better promote the development of the tracking community, this work analyzes the limitations of existing Siamese trackers and conducts additional experiments represented by low-illumination evaluations. In the end, prospects for the development of Siamese tracking for UAV-based AI systems are deeply discussed. The unified framework of leading-edge Siamese trackers, i.e., code library, and the results of their experimental evaluations are available at https://github.com/vision4robotics/SiameseTracking4UAV.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/ .

  2. https://www.nvidia.com/en-us/data-center/a100/ .

  3. The integrated code library is available at https://github.com/vision4robotics/SiameseTracking4UAV .

  4. The raw results of the experimental evaluation are available at https://github.com/vision4robotics/SiameseTracking4UAV .

  5. https://docs.px4.io/master/en/flight_controller/pixhawk.html .

  6. http://qgroundcontrol.com/ .

  7. https://developer.nvidia.com/tensorrt .

References

  • Abbass MY, Kwon KC, Kim N et al. (2021) A survey on online learning for visual tracking. Vis Comput 37(5):993–1014. https://doi.org/10.1007/s00371-020-01848-y

    Article  Google Scholar 

  • Akbari Y, Almaadeed N, Al-Maadeed S et al. (2021) Applications, databases and open computer vision research from drone videos and images: a survey. Artif Intell Rev 54(5):3887–3938. https://doi.org/10.1007/s10462-020-09943-1

    Article  Google Scholar 

  • Baykara HC, Bıyık E, Gül G et al. (2017) Real-time detection, tracking and classification of multiple moving objects in UAV videos. In: Proceedings of the international conference on tools with artificial intelligence (ICTAI), pp 945–950. https://doi.org/10.1109/ICTAI.2017.00145

  • Bertinetto L, Henriques JF, Valmadre J et al. (2016a) Learning feed-forward one-shot learners. In: Proceedings of the advances in neural information processing systems (NeurIPS), pp 1–9

  • Bertinetto L, Valmadre J, Henriques JF, et al. (2016b) Fully-convolutional Siamese networks for object tracking. In: Proceedings of the European conference on computer vision workshops (ECCVW), pp 850–865. https://doi.org/10.1007/978-3-319-48881-3_56

  • Bhat G, Danelljan M, Van Gool L et al. (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 6181–6190. https://doi.org/10.1109/ICCV.2019.00628

  • Bromley J, Guyon I, LeCun Y et al. (1993) Signature verification using a "Siamese" time delay neural network. In: Proceedings of the advances in neural information processing systems (NeurIPS), pp 1–8

  • Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6154–6162. https://doi.org/10.1109/CVPR.2018.00644

  • Cao Y, Xu J, Lin S et al. (2019) GCNet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision workshops (ICCVW), pp 1971–1980. https://doi.org/10.1109/ICCVW.2019.00246

  • Cao Z, Fu C, Ye J et al. (2021a) HiFT: hierarchical feature Transformer for aerial tracking. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 15437–15446. https://doi.org/10.1109/ICCV48922.2021.01517

  • Cao Z, Fu C, Ye J et al. (2021b) SiamAPN++: Siamese attentional aggregation network for real-time UAV tracking. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3086–3092. https://doi.org/10.1109/IROS51168.2021.9636309

  • Cao Z, Huang Z, Pan L et al. (2022) TCTrack: temporal contexts for aerial tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 14778–14788. https://doi.org/10.1109/CVPR52688.2022.01438

  • Carion N, Massa F, Synnaeve G et al. (2020) End-to-end object detection with Transformers. In: Proceedings of the European conference on computer vision (ECCV), pp 213–229. https://doi.org/10.1007/978-3-030-58452-8_13

  • Chen P, Zhou Y (2019) The review of target tracking for UAV. In: Proceedings of the IEEE conference on industrial electronics and applications (ICIEA), pp 1800–1805. https://doi.org/10.1109/ICIEA.2019.8833668

  • Chen LC, Papandreou G, Kokkinos I et al. (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  • Chen X, Yan X, Zheng F et al. (2020a) One-shot adversarial attacks on visual tracking with dual attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10173–10182, https://doi.org/10.1109/CVPR42600.2020.01019

  • Chen Z, Zhong B, Li G et al. (2020b) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6667–6676. https://doi.org/10.1109/CVPR42600.2020.00670

  • Chen X, Yan B, Zhu J et al. (2021) Transformer tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8122–8131. https://doi.org/10.1109/CVPR46437.2021.00803

  • Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1800–1807. https://doi.org/10.1109/CVPR.2017.195

  • Dai J, Qi H, Xiong Y et al. (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 764–773. https://doi.org/10.1109/ICCV.2017.89

  • Dai Z, Cai B, Lin Y et al. (2021) UP-DETR: unsupervised pre-training for object detection with Transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1601–1610. https://doi.org/10.1109/CVPR46437.2021.00165

  • Danelljan M, Bhat G, Khan FS et al. (2019) ATOM: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4655–4664. https://doi.org/10.1109/CVPR.2019.00479

  • Danelljan M, Van Gool L, Timofte R (2020) Probabilistic regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7181–7190. https://doi.org/10.1109/CVPR42600.2020.00721

  • De Boer PT, Kroese DP, Mannor S et al. (2005) A tutorial on the cross-entropy method. Ann Oper Res 134(1):19–67. https://doi.org/10.1007/s10479-005-5724-z

    Article  MathSciNet  MATH  Google Scholar 

  • Dong X, Shen J (2018) Triplet loss in Siamese network for object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 459–474. https://doi.org/10.1007/978-3-030-01261-8_28

  • Dosovitskiy A, Beyer L, Kolesnikov A et al. (2020) An image is worth 16X16 words: Transformers for image recognition at scale. In: Proceedings of the international conference on learning representations (ICLR), pp 1–22

  • Du D, Qi Y, Yu H et al. (2018) The unmanned aerial vehicle benchmark: object detection and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 370–386. https://doi.org/10.1007/978-3-030-01249-6_23

  • Elloumi M, Dhaou R, Escrig B et al. (2018) Monitoring road traffic with a UAV-based system. In: Proceedings of the IEEE wireless communications and networking conference (WCNC), pp 1–6. https://doi.org/10.1109/WCNC.2018.8377077

  • Fan H, Ling H (2019) Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7944–7953. https://doi.org/10.1109/CVPR.2019.00814

  • Fan H, Wen L, Du D et al. (2020) VisDrone-SOT2020: the vision meets drone single-object tracking challenge results. In: Proceedings of the European conference on computer vision (ECCV), pp 728–749. https://doi.org/10.1007/978-3-030-66823-5_44

  • Ferdaus MM, Anavatti SG, Pratama M et al. (2020) Towards the use of fuzzy logic systems in rotary wing unmanned aerial vehicle: a review. Artif Intell Rev 53(1):257–290. https://doi.org/10.1007/s10462-018-9653-z

    Article  Google Scholar 

  • Fiaz M, Mahmood A, Javed S et al. (2019) Handcrafted and deep trackers: recent visual object tracking approaches and trends. ACM Comput Surv 52(2):1–44. https://doi.org/10.1145/3309665

    Article  Google Scholar 

  • Fu J, Liu J, Tian H et al. (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3141–3149. https://doi.org/10.1109/CVPR.2019.00326

  • Fu C, Xu J, Lin F et al. (2020) Object saliency-aware dual regularized correlation filter for real-time aerial tracking. IEEE Trans Geosci Remote Sens 58(12):8940–8951. https://doi.org/10.1109/TGRS.2020.2992301

    Article  Google Scholar 

  • Fu C, Cao Z, Li Y et al. (2021a) Onboard real-time aerial tracking with efficient Siamese anchor proposal network. IEEE Trans Geosci Remote Sens 60:1–13. https://doi.org/10.1109/TGRS.2021.3083880

    Article  Google Scholar 

  • Fu C, Cao Z, Li Y et al. (2021b) Siamese anchor proposal network for high-speed aerial tracking. In: Proceedings of the IEEE international conference on robotics and automation (ICRA), pp 510–516. https://doi.org/10.1109/ICRA48506.2021.9560756

  • Fu C, Ding F, Li Y et al. (2021c) Learning dynamic regression with automatic distractor repression for real-time UAV tracking. Eng Appl Artif Intell 98(104):116. https://doi.org/10.1016/j.engappai.2020.104116

    Article  Google Scholar 

  • Fu C, Ye J, Xu J et al. (2021d) Disruptor-aware interval-based response inconsistency for correlation filters in real-time aerial tracking. IEEE Trans Geosci Remote Sens 59(8):6301–6313. https://doi.org/10.1109/TGRS.2020.3030265

    Article  Google Scholar 

  • Fu C, Dong H, Ye J et al. (2022a) HighlightNet: highlighting low-light potential features for real-time UAV tracking. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 12146–12153. https://doi.org/10.1109/IROS47612.2022.9981070

  • Fu C, Li B, Ding F et al. (2022b) Correlation filters for unmanned aerial vehicle-based aerial tracking: a review and experimental evaluation. IEEE Geosci Remote Sens Mag 10(1):125–160. https://doi.org/10.1109/MGRS.2021.3072992

    Article  Google Scholar 

  • Fu C, Li S, Yuan X et al. (2022c) Ad2Attack: adaptive adversarial attack on real-time UAV tracking. In: Proceedings of the IEEE international conference on robotics and automation (ICRA), pp 5893–5899. https://doi.org/10.1109/ICRA46639.2022.9812056

  • Fu C, Cai M, Li S et al. (2023) Continuity-aware latent interframe information mining for reliable UAV tracking, In: Proceedings of the IEEE international conference on robotics and automation (ICRA), pp 1327–1333. https://doi.org/10.1109/ICRA48891.2023.10160673

  • Gao J, Zhang T, Xu C (2019) Graph convolutional tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4644–4654. https://doi.org/10.1109/CVPR.2019.00478

  • Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169

  • Gonzalez LF, Montes GA, Puig E et al. (2016) Unmanned aerial vehicles (UAVs) and artificial intelligence revolutionizing wildlife monitoring and conservation. ACS Sens 16(1):97. https://doi.org/10.3390/s16010097

    Article  Google Scholar 

  • Guo Q, Feng W, Zhou C et al. (2017) Learning dynamic Siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1781–1789. https://doi.org/10.1109/ICCV.2017.196

  • Guo D, Shao Y, Cui Y et al. (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9538–9547. https://doi.org/10.1109/CVPR46437.2021.00942

  • Guo D, Wang J, Cui Y et al. (2020) SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6268–6276. https://doi.org/10.1109/CVPR42600.2020.00630

  • He K, Zhang X, Ren S et al. (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  • Hao J, Zhou Y, Zhang G et al. (2018) A review of target tracking algorithm based on UAV. In: Proceedings of the IEEE international conference on cyborg and bionic systems (CBS), pp 328–333. https://doi.org/10.1109/CBS.2018.8612263

  • He A, Luo C, Tian X et al. (2018a) A twofold Siamese network for real-time object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4834–4843. https://doi.org/10.1109/CVPR.2018.00508

  • He A, Luo C, Tian X et al. (2018b) Towards a better match in Siamese network based visual object tracker. In: Proceedings of the European conference on computer vision workshops (ECCVW), pp 132–147. https://doi.org/10.1007/978-3-030-11009-3_7

  • He K, Gkioxari G, Dollár P et al. (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175

    Article  Google Scholar 

  • Held D, Thrun S, Savarese S (2016) Learning to track at 100 FPS with deep regression networks. In: Proceedings of the European conference on computer vision (ECCV), pp 749–765. https://doi.org/10.1007/978-3-319-46448-0_45

  • Howard AG, Zhu M, Chen B et al. (2017) MobileNets: efficient convolutional neural networks for mobile vision applications, pp 1–9. arXiv preprint arXiv:1704.04861

  • Howard A, Sandler M, Chen B et al. (2019) Searching for MobileNetV3. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 1314–1324. https://doi.org/10.1109/ICCV.2019.00140

  • Hu J, Shen L, Albanie S et al. (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372

    Article  Google Scholar 

  • Huang C, Lucey S, Ramanan D (2017) Learning policies for adaptive tracking with deep feature cascades. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 105–114. https://doi.org/10.1109/ICCV.2017.21

  • Huang Z, Fu C, Li Y et al. (2019) Learning aberrance repressed correlation filters for real-time UAV tracking. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 2891–2900. https://doi.org/10.1109/ICCV.2019.00298

  • Huang L, Zhao X, Huang K (2020) GlobalTrack: a simple and strong baseline for long-term tracking. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 11037–11044. https://doi.org/10.1609/aaai.v34i07.6758

  • Javed S, Danelljan M, Khan FS et al. (2022) Visual object tracking with discriminative filters and Siamese networks: a survey and outlook. IEEE Trans Pattern Anal Mach Intell 45(5):6552-6574. https://doi.org/10.1109/TPAMI.2022.3212594

    Article  Google Scholar 

  • Jiang B, Luo R, Mao J et al. (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 784–799. https://doi.org/10.1007/978-3-030-01264-9_48

  • Karaduman M, Cınar A, Eren H (2019) UAV traffic patrolling via road detection and tracking in anonymous aerial video frames. J Intell Robot Syst 95:675–690. https://doi.org/10.1007/s10846-018-0954-x

    Article  Google Scholar 

  • Kingma DP, Welling M (2014) Auto-encoding variational Bayes. In: Proceedings of the international conference on learning representations (ICLR), pp 1–14

  • Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: Proceedings of the international conference on learning representations (ICLR), pp 1–14

  • Krebs S, Duraisamy B, Flohr F (2017) A survey on leveraging deep neural networks for object tracking. In: Proceedings of the international conference on intelligent transportation systems (ITSC), pp 411–418. https://doi.org/10.1109/ITSC.2017.8317904

  • Kristan M, Leonardis A, Matas J et al. (2016) The visual object tracking VOT2016 challenge results. In: Proceedings of the European conference on computer vision workshops (ECCVW), pp 777–823. https://doi.org/10.1007/978-3-319-48881-3_54

  • Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  • Law H, Teng Y, Russakovsky O et al. (2020) CornerNet-Lite: efficient keypoint based object detection. In: Proceedings of the British machine vision conference (BMVC), pp 1–15

  • Leal-Taixé L, Canton-Ferrer C, Schindler K (2016) Learning by tracking: Siamese CNN for robust target association. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 418–425. https://doi.org/10.1109/CVPRW.2016.59

  • Li S, Yeung DY (2017) Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 1–7. https://doi.org/10.1609/aaai.v31i1.11205

  • Li X, Hu W, Shen C et al. (2013) A survey of appearance models in visual object tracking. ACM Trans Intell Syst Technol 4(4):1–48. https://doi.org/10.1145/2508037.2508039

    Article  Google Scholar 

  • Li Y, Song Y, Luo J (2017) Improving pairwise ranking for multi-label image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1837–1845. https://doi.org/10.1109/CVPR.2017.199

  • Li B, Yan J, Wu W et al. (2018a) High performance visual tracking with Siamese region proposal network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8971–8980. https://doi.org/10.1109/CVPR.2018.00935

  • Li P, Wang D, Wang L et al. (2018b) Deep visual tracking: review and experimental comparison. Pattern Recogn 76:323–338. https://doi.org/10.1016/j.patcog.2017.11.007

    Article  Google Scholar 

  • Li B, Wu W, Wang Q, et al. (2019a) SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4277–4286. https://doi.org/10.1109/CVPR.2019.00441

  • Li X, Ma C, Wu B et al. (2019b) Target-aware deep tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1369–1378. https://doi.org/10.1109/CVPR.2019.00146

  • Li M, Wang YX, Ramanan D (2020a) Towards streaming perception. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 473–488. https://doi.org/10.1007/978-3-030-58536-5_28

  • Li Y, Fu C, Ding F et al. (2020b) AutoTrack: towards high-performance visual tracking for UAV with automatic spatio-temporal regularization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11920–11929. https://doi.org/10.1109/CVPR42600.2020.01194

  • Li B, Fu C, Ding F et al. (2021a) ADTrack: target-aware dual filter learning for real-time anti-dark UAV tracking. In: Proceedings of the IEEE international conference on robotics and automation (ICRA), pp 496–502. https://doi.org/10.1109/ICRA48506.2021.9561564

  • Li B, Li Y, Ye J, et al. (2021b) Predictive Visual Tracking: A New Benchmark and Baseline Approach, pp 1–8. arXiv preprint arXiv:2103.04508

  • Li B, Fu C, Ding F et al. (2022) All-day object tracking for unmanned aerial vehicle. IEEE Trans Mob Comput. https://doi.org/10.1109/TMC.2022.3162892

    Article  Google Scholar 

  • Li S, Fu C, Lu K et al. (2023) Boosting UAV tracking with voxel-based trajectory-aware pre-training. IEEE Robot Autom Lett 8(2):1133–1140. https://doi.org/10.1109/LRA.2023.3236583

    Article  Google Scholar 

  • Lin TY, Maire M, Belongie S et al. (2014) Microsoft COCO: common objects in context. In: Proceedings of the European conference on computer vision (ECCV), pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48

  • Lin TY, Goyal P, Girshick R et al. (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/TPAMI.2018.2858826

    Article  Google Scholar 

  • Lin F, Fu C, He Y et al. (2021) ReCF: exploiting response reasoning for correlation filters in real-time UAV tracking. IEEE Trans Intell Transp Syst 23(8):10469-10480. https://doi.org/10.1109/TITS.2021.3094654

    Article  Google Scholar 

  • Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965

  • Lu K, Fu C, Wang Y et al. (2023) Cascaded denoising Transformer for UAV nighttime tracking. IEEE Robot Autom Lett 8(6):3142–3149. https://doi.org/10.1109/LRA.2023.3264711

    Article  Google Scholar 

  • Luiten J, Voigtlaender P, Leibe B (2018) PReMVOS: proposal-generation, refinement and merging for video object segmentation. In: Proceedings of the Asian conference on computer vision (ACCV), pp 565–580. https://doi.org/10.1007/978-3-030-20870-7_35

  • Luo Y, Yu X, Yang D et al. (2022) A survey of intelligent transmission line inspection based on unmanned aerial vehicle. Artif Intell Rev 56:173-201. https://doi.org/10.1007/s00371-020-01848-y

    Article  Google Scholar 

  • Ma N, Zhang X, Zheng HT et al. (2018) ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European conference on computer vision (ECCV), pp 116–131. https://doi.org/10.1007/978-3-030-01264-9_8

  • Marvasti-Zadeh SM, Cheng L, Ghanei-Yakhdan H et al. (2022) Deep learning for visual tracking: a comprehensive survey. IEEE Trans Intell Transp Syst 23(5):3943–3968. https://doi.org/10.1109/TITS.2020.3046478

    Article  Google Scholar 

  • Mittal S (2019) A survey on optimized implementation of deep learning models on the NVIDIA Jetson platform. J Syst Archit 97:428–442. https://doi.org/10.1016/j.sysarc.2019.01.011

    Article  Google Scholar 

  • Müeller M, Smith N, Ghanem B (2016) A benchmark and simulator for Uav tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 445–461. https://doi.org/10.1007/978-3-319-46448-0_27

  • Müller M, Bibi A, Giancola S et al. (2018) TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Proceedings of the European conference on computer vision (ECCV), pp 300–317. https://doi.org/10.1007/978-3-030-01246-5_19

  • Ollero A, Tognon M, Suarez A et al. (2021) Past, present, and future of aerial robotic manipulators. IEEE Trans Robot 38(1):626–645. https://doi.org/10.1109/TRO.2021.3084395

    Article  Google Scholar 

  • Ondrašovič M, Tarábek P (2021) Siamese visual object tracking: a survey. IEEE Access 9:110149–110172. https://doi.org/10.1109/ACCESS.2021.3101988

    Article  Google Scholar 

  • Peng J, Jiang Z, Gu Y et al. (2021) SiamRCR: reciprocal classification and regression for visual object tracking. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), pp 1–10. https://doi.org/10.24963/ijcai.2021/132

  • Pflugfelder R (2017) An in-depth analysis of visual tracking with Siamese neural networks, pp 1–19. arXiv preprint arXiv:1707.00569

  • Real E, Shlens J, Mazzocchi S et al. (2017) YouTube-BoundingBoxes: a large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7464–7473. https://doi.org/10.1109/CVPR.2017.789

  • Ren S, He K, Girshick R et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  • Rezatofighi H, Tsoi N, Gwak J et al. (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 658–666. https://doi.org/10.1109/CVPR.2019.00075

  • Russakovsky O, Deng J, Su H et al. (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  • Sandler M, Howard A, Zhu M et al. (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4510–4520. https://doi.org/10.1109/CVPR.2018.00474

  • Scholkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge

    Google Scholar 

  • Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/TPAMI.2016.2572683

    Article  Google Scholar 

  • Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the international conference on learning representations (ICLR), pp 1–14

  • Smeulders AW, Chu DM, Cucchiara R et al. (2014) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468. https://doi.org/10.1109/TPAMI.2013.230

    Article  Google Scholar 

  • Sosnovik I, Moskalev A, Smeulders A (2021) Scale equivariance improves Siamese tracking. In: Proceedings of the IEEE winter conference on applications of computer vision (WACV), pp 2764–2773. https://doi.org/10.1109/WACV48630.2021.00281

  • Sosnovik I, Szmaja M, Smeulders A (2020) Scale-equivariant steerable networks. In: Proceedings of the international conference on learning representations (ICLR), pp 1–14

  • Szegedy C, Liu W, Jia Y et al. (2015) Going deeper with convolutions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594

  • Szegedy C, Vanhoucke V, Ioffe S et al. (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308

  • Tang J, Duan H, Lao S (2022) Swarm intelligence algorithms for multiple unmanned aerial vehicles collaboration: a comprehensive review. Artif Intell Rev 56:4295-4327. https://doi.org/10.1007/s10462-022-10281-7

    Article  Google Scholar 

  • Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1420–1429. https://doi.org/10.1109/CVPR.2016.158

  • Tian Z, Shen C, Chen H et al. (2019) FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 9626–9635. https://doi.org/10.1109/ICCV.2019.00972

  • Tony LA, Jana S, Varun V, et al. (2022) UAV collaboration for autonomous target capture. In: Proceedings of the congress on intelligent systems (CIS), pp 847–862. https://doi.org/10.1007/978-981-16-9416-5_62

  • Uijlings JR, Van De Sande KE, Gevers T et al. (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171. https://doi.org/10.1007/s11263-013-0620-5

    Article  Google Scholar 

  • Valmadre J, Bertinetto L, Henriques J et al. (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5000–5008. https://doi.org/10.1109/CVPR.2017.531

  • Vaswani A, Shazeer N, Parmar N et al. (2017) Attention is all you need. In: Proceedings of the advances in neural information processing systems (NeurIPS), pp 1–11

  • Vedaldi A, Lenc K (2015) MatConvNet: convolutional neural networks for MATLAB. In: Proceedings of the ACM multimedia conference (MM), pp 689–692. https://doi.org/10.1145/2733373.2807412

  • Veličković P, Cucurull G, Casanova A et al. (2018) Graph attention networks. In: Proceedings of the international conference on learning representations (ICLR), pp 1–12

  • Voigtlaender P, Luiten J, Torr PH et al. (2020) Siam R-CNN: visual tracking by re-detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6577–6587. https://doi.org/10.1109/CVPR42600.2020.00661

  • Wang Q, Teng Z, Xing J et al. (2018a) Learning attentions: residual attentional Siamese network for high performance online visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4854–4863. https://doi.org/10.1109/CVPR.2018.00510

  • Wang X, Li C, Luo B et al. (2018b) SINT++: robust visual tracking via adversarial positive instance generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4864–4873. https://doi.org/10.1109/CVPR.2018.00511

  • Wang Q, Zhang L, Bertinetto L et al. (2019) Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1328–1338. https://doi.org/10.1109/CVPR.2019.00142

  • Wang H, Zhu Y, Adam H et al. (2021a) Max-Deeplab: end-to-end panoptic segmentation with mask Transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5459–5470. https://doi.org/10.1109/CVPR46437.2021.00542

  • Wang Y, Xu Z, Wang X et al. (2021b) End-to-end video instance segmentation with Transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8737–8746. https://doi.org/10.1109/CVPR46437.2021.00863

  • Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226

    Article  Google Scholar 

  • Wu X, Li W, Hong D et al. (2022) Deep learning for unmanned aerial vehicle-based object detection and tracking: a survey. IEEE Geosci Remote Sens Mag 10(1):91–124. https://doi.org/10.1109/MGRS.2021.3115137

    Article  Google Scholar 

  • Xie S, Girshick R, Dollár P et al. (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5987–5995. https://doi.org/10.1109/CVPR.2017.634

  • Xu Y, Wang Z, Li Z et al. (2020) SiamFC++: towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 12549–12556. https://doi.org/10.1609/aaai.v34i07.6944

  • Yan B, Wang D, Lu H et al. (2020) Cooling-shrinking attack: blinding the tracker with imperceptible noises. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 987–996. https://doi.org/10.1109/CVPR42600.2020.00107

  • Yan B, Peng H, Fu J et al. (2021a) Learning spatio-temporal Transformer for visual tracking. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 10428–10437. https://doi.org/10.1109/ICCV48922.2021.01028

  • Yan B, Peng H, Wu K et al. (2021b) LightTrack: finding lightweight neural networks for object tracking via one-shot architecture search. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 15175–15184. https://doi.org/10.1109/CVPR46437.2021.01493

  • Yang H, Shao L, Zheng F et al. (2011) Recent advances and trends in visual tracking: a review. Neurocomputing 74(18):3823–3831. https://doi.org/10.1016/j.neucom.2011.07.024

    Article  Google Scholar 

  • Yang K, He Z, Pei W et al. (2022) SiamCorners: Siamese corner networks for visual tracking. IEEE Trans Multimed 24:1956–1967. https://doi.org/10.1109/TMM.2021.3074239

    Article  Google Scholar 

  • Yao L, Fu C, Li S et al. (2023) SGDViT: saliency-guided dynamic vision Transformer for UAV tracking, In: Proceedings of the IEEE international conference on robotics and automation (ICRA), pp 3353-3359. https://doi.org/10.1109/ICRA48891.2023.10161487

  • Ye J, Fu C, Zheng G et al. (2021) DarkLighter: light up the darkness for UAV tracking. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3079–3085. https://doi.org/10.1109/IROS51168.2021.9636680

  • Ye J, Fu C, Cao Z et al. (2022a) Tracker meets night: a Transformer enhancer for UAV tracking. IEEE Robot Autom Lett 7(2):3866–3873. https://doi.org/10.1109/LRA.2022.3146911

    Article  Google Scholar 

  • Ye J, Fu C, Lin F et al. (2022b) Multi-regularized correlation filter for UAV tracking and self-localization. IEEE Trans Ind Electron 69(6):6004–6014. https://doi.org/10.1109/TIE.2021.3088366

    Article  Google Scholar 

  • Ye J, Fu C, Zheng G et al. (2022c) Unsupervised domain adaptation for nighttime aerial tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8886–8895. https://doi.org/10.1109/CVPR52688.2022.00869

  • Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv 38(4):13–45. https://doi.org/10.1145/1177352.1177355

    Article  Google Scholar 

  • You S, Zhu H, Li M et al. (2019) A review of visual trackers and analysis of its application to mobile robot, pp 1–25. arXiv preprint arXiv:1910.09761

  • Yu J, Jiang Y, Wang Z et al. (2016) UnitBox: an advanced object detection network. In: Proceedings of the ACM multimedia conference (MM), pp 516–520. https://doi.org/10.1145/2964284.2967274

  • Yu Y, Xiong Y, Huang W et al. (2020) Deformable Siamese attention networks for visual object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6727–6736. https://doi.org/10.1109/CVPR42600.2020.00676

  • Zagoruyko S, Komodakis N (2017) Deep compare: a study on using convolutional neural networks to compare image patches. Comput Vis Image Underst 164:38–55. https://doi.org/10.1016/j.cviu.2017.10.007

    Article  Google Scholar 

  • Zhang H, Dana K, Shi J et al. (2018a) Context encoding for semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7151–7160. https://doi.org/10.1109/CVPR.2018.00747

  • Zhang X, Zhou X, Lin M et al. (2018b) ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6848–6856. https://doi.org/10.1109/CVPR.2018.00716

  • Zhang Y, Wang L, Qi J et al. (2018c) Structured Siamese network for real-time visual tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 351–366. https://doi.org/10.1007/978-3-030-01240-3_22

  • Zhang Z, Peng H (2019) Deeper and wider Siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4586–4595. https://doi.org/10.1109/CVPR.2019.00472

  • Zhang L, Gonzalez-Garcia A, Weijer JVD et al. (2019) Learning the model update for Siamese trackers. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 4009–4018. https://doi.org/10.1109/ICCV.2019.00411

  • Zhang Z, Peng H, Fu J et al. (2020) Ocean: object-aware anchor-free tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 771–787. https://doi.org/10.1007/978-3-030-58589-1_46

  • Zheng G, Fu C, Ye J et al. (2022a) Scale-aware Siamese object tracking for vision-based UAM approaching. IEEE Trans Ind Inform pp 1-12. https://doi.org/10.1109/TII.2022.3228197

    Article  Google Scholar 

  • Zheng G, Fu C, Ye J et al. (2022b) Siamese object tracking for vision-based UAM approaching with pairwise scale-channel attention. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 10486–10492. https://doi.org/10.1109/IROS47612.2022.9982189

  • Zhou W, Wen L, Zhang L et al. (2021) SiamCAN: real-time visual tracking based on Siamese center-aware network. IEEE Trans Image Process 30:3597–3609. https://doi.org/10.1109/TIP.2021.3060905

    Article  Google Scholar 

  • Zhu Z, Wang Q, Li B, et al. (2018) Distractor-aware Siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 101–117. https://doi.org/10.1007/978-3-030-01240-3_7

  • Zuo H, Fu C, Li S et al. (2023) Adversarial blur-deblur network for robust UAV tracking. IEEE Robot Autom Lett 8(2):1101–1108. https://doi.org/10.1109/LRA.2023.3236584

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 62173249) and the Natural Science Foundation of Shanghai (No. 20ZR1460100).

Author information

Authors and Affiliations

Authors

Contributions

CF and KL wrote the main manuscript text. GZ and ZC participated in the completion of Sect. 3 (Siamese Trackers). JY and BL participated in the completion of Sect. 4 (Experimental Evaluation). GL participated in the completion of onboard tests. All authors reviewed the manuscript.

Corresponding author

Correspondence to Changhong Fu.

Ethics declarations

Confict of interest

All authors declare that they have no competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, C., Lu, K., Zheng, G. et al. Siamese object tracking for unmanned aerial vehicle: a review and comprehensive analysis. Artif Intell Rev 56 (Suppl 1), 1417–1477 (2023). https://doi.org/10.1007/s10462-023-10558-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-023-10558-5

Keywords

Navigation