Skip to main content
Log in

Pas: a scale-invariant approach to maritime search and rescue object detection using preprocessing and attention scaling

  • Original Research Paper
  • Published:
Intelligent Service Robotics Aims and scope Submit manuscript

Abstract

Object detection is a primary means of unmanned aerial vehicle (UAV) maritime search and rescue. The problem of scale variation caused by UAV flight height changes, shooting angle changes, and giant waves seriously affects the detection performance. However, most work does not explicitly consider the effects of these factors. In this work, we propose an algorithm called Preprocessing and Attention Scaling, which explicitly considers the scale variation problem caused by height, angle changes, and giant waves for the first time and solves it through Preprocessing Scaling and Attention Scaling. The Preprocessing Scaling module scales and perspective changes the images according to each photograph’s recorded flight altitude and shooting angle and crops them to the appropriate size, significantly improving the detection accuracy and shortening the inference time. At the same time, the scale variation caused by the up and down of the object due to the vast swells cannot be solved by the Preprocessing Scaling module, so we designed the Attention Scaling module again to quickly capture the area that needs further scale change by fusing the horizontal attention and vertical attention, and then transform it to the appropriate scale by the affine transformation, further improving detection accuracy. We extensively tested PAS on the well-known SeaDronesSee-DET and the SeaDronesSee-DET v2 (S-ODv2) datasets, significantly improving the detection accuracy. In addition, we successfully tested our method on a height-angle transfer task, where we trained on some height-angle intervals and tested on different height-angle intervals, achieving good results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Gasienica-Józkowy J, Knapik M, Cyganek B (2021) An ensemble deep learning method with optimized weights for drone-based water rescue and surveillance. Integr Comput Aided Eng 28(4):1–15

    Google Scholar 

  2. Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: Exceeding yolo series in 2021

  3. Wang C-Y, Bochkovskiy A, Liao H-Y (2022) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv e-prints

  4. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/tpami.2016.2577031

    Article  Google Scholar 

  5. Zhou X, Koltun V, Krähenbühl P (2021) Probabilistic two-stage detection. Cornell University - arXiv

  6. Varga L, Kiefer B, Messmer M, Zell A (2021) Seadronessee: A maritime benchmark for detecting humans in open water. arXiv: Computer Vision and Pattern Recognition

  7. Zhang Y, Guo L, Wang Z, Xu F (2020) Intelligent ship detection in remote sensing images based on multi-layer convolutional feature fusion. Remote Sens 12(20):3316

    Article  ADS  Google Scholar 

  8. Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr42600.2020.01079

  9. Huang H, Huo C, Wei F, Pan C (2019) Rotation and scale-invariant object detector for high resolution optical remote sensing images. In: IGARSS 2019 - 2019 IEEE international geoscience and remote sensing symposium. https://doi.org/10.1109/igarss.2019.8898495

  10. Wang C-Y, Mark Liao H-Y, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh I-H (2020) Cspnet: A new backbone that can enhance learning capability of cnn. In: 2020 IEEE/cvf conference on computer vision and pattern recognition workshops (CVPRW). https://doi.org/10.1109/cvprw50498.2020.00203

  11. Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. Cornell University - arXiv

  12. Ghiasi G, Lin T-Y, Le QV (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00720

  13. Xu J, Li Y, Wang S (2021) Adazoom: Adaptive zoom network for multi-scale object detection in large scenes. Cornell University - arXiv

  14. Liao J, Piao Y, Su J, Cai G, Huang X, Chen L, Huang Z, Wu Y (2021) Unsupervised cluster guided object detection in aerial images. IEEE J Select Top Appl Earth Observ Remote Sens 14:11204–11216. https://doi.org/10.1109/jstars.2021.3122152

    Article  ADS  Google Scholar 

  15. Li C, Yang T, Zhu S, Chen C, Guan S (2020) Density map guided object detection in aerial images. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW). https://doi.org/10.1109/cvprw50498.2020.00103

  16. Wu Z, Suresh K, Narayanan P, Xu H, Kwon H, Wang Z (2019) Delving into robust object detection from unmanned aerial vehicles: a deep nuisance disentanglement approach. Cornell University - arXiv

  17. Messmer M, Kiefer B, Zell A (2022) Gaining scale invariance in uav bird’s eye view object detection by adaptive resizing. In: 2022 26th international conference on pattern recognition (ICPR). https://doi.org/10.1109/icpr56361.2022.9956122

  18. Kim S, Heo WY, Sung H, Yoon D, Jeong J (2020) Height-adaptive vehicle detection in aerial imagery using metadata of eo sensor. In: automatic target recognition XXX. https://doi.org/10.1117/12.2554376

  19. Li Y, Chen Y, Wang N, Zhang Z-X (2019) Scale-aware trident networks for object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV). https://doi.org/10.1109/iccv.2019.00615

  20. Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S. (2017) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2017.106

  21. Kiefer B, Kristan M, Perš J, Žust L, Poiesi F, Andrade F, Bernardino A, Dawkins M., Raitoharju J, Quan Y, Atmaca A, Höfer T, Zhang Q, Xu Y, Zhang J, Tao D, Sommer L, Spraul R, Zhao H, Zhang H, Zhao Y, Augustin J, Jeon E-i, Lee I, Zedda L, Loddo A, Ruberto C, Verma S, Gupta S, Muralidhara S, Hegde N, Xing D, Evangeliou N, Tzes A, Bartl V, Špaňhel J, Herout A, Bhowmik N, Breckon T, Kundargi S, Anvekar T, Desai C, Tabib R, Mudengudi U, Vats A, Song Y, Liu D, Li Y, Li S, Tan C, Lan L, Somers V, Vleeschouwer C, Alahi A, Huang H-W, Yang C-Y, Hwang J-N, Kim P-K, Kim K, Lee K, Jiang S, Li H, Ziqiang Z, Vu T-A, Nguyen-Truong H, Yeung S-K, Jia Z, Yang S, Hsu C-C, Hou X-Y, Jhang Y-A, Yang S, Yang M-T (2022) 1st workshop on maritime computer vision (macvi) 2023: Challenge results

  22. Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: Deformable transformers for end-to-end object detection. arXiv: Computer Vision and Pattern Recognition

  23. Zhou X, Wang D, Krähenbühl P (2019) Objects as points. Vision and Pattern Recognition arXiv: Computer

  24. Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV). https://doi.org/10.1109/iccv.2019.00972

  25. Lin T-Y, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: 2017 IEEE international conference on computer vision (ICCV). https://doi.org/10.1109/iccv.2017.324

  26. Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: Towards balanced learning for object detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) . https://doi.org/10.1109/cvpr.2019.00091

  27. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers, pp. 213–229. https://doi.org/10.1007/978-3-030-58452-8_13

  28. Cafarelli D, Ciampi L, Vadicamo L, Gennaro C, Berton A, Paterni M, Benvenuti C, Passera M, Falchi F (2022) Mobdrone: a drone video dataset for man overboard rescue

  29. Kiefer B, Ott D, Zell A (2021) Leveraging synthetic data in object detection on unmanned aerial vehicles

  30. Moeyersons J, Verhoeve B, Maenhaut P-J, Volckaert B, De Turck F (2019) Pluggable drone imaging analysis framework for mob detection during open-air events. In: proceedings of the 8th international conference on pattern recognition applications and methods. https://doi.org/10.5220/0007260400640072

  31. Rizk M, Slim F, Baghdadi A, Diguet J-P (2023) Towards real-time human detection in maritime environment using embedded deep learning, pp. 583–593. https://doi.org/10.1007/978-3-031-16281-7_55

  32. Vasilopoulos E, Vosinakis G, Krommyda M, Karagiannidis L, Ouzounoglou E, Amditis A (2022) Autonomous object detection using a uav platform in the maritime environment

  33. Goncalves L, Damas B (2022) Automatic detection of rescue targets in maritime search and rescue missions using uavs. In: 2022 international conference on unmanned aircraft systems (ICUAS). https://doi.org/10.1109/icuas54217.2022.9836137

  34. Zheng R, Yang R, Lu K, Zhang S (2019) A search and rescue system for maritime personnel in disaster carried on unmanned aerial vehicle. In: 2019 18th international symposium on distributed computing and applications for business engineering and science (DCABES)

  35. Bai J, Dai J, Wang Z, Yang S (2022) A detection method of the rescue targets in the marine casualty based on improved yolov5s. Front Neurorob 16:1053. https://doi.org/10.3389/fnbot.2022.1053124

    Article  Google Scholar 

  36. Feraru VA, Andersen RE, Boukas E (2020) Towards an autonomous uav-based system to assist search and rescue operations in man overboard incidents. In: 2020 IEEE international symposium on safety, security, and rescue robotics (SSRR). https://doi.org/10.1109/ssrr50563.2020.9292632

  37. Zhu X, Lyu S, Wang X, Zhao Q (2021) Tph-yolov5: improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. cornell university - arXiv

  38. Woo S, Park J, Lee J-Y, Kweon I (2018) Cbam: convolutional block attention module. Vision and Pattern Recognition arXiv: Computer

  39. Wu Y, Zhang K, Wang J, Wang Y, Wang Q, Li Q (2022) Cdd-net: a context-driven detection network for multiclass object detection. IEEE Geosci Remote Sens Lett 19:1–5. https://doi.org/10.1109/lgrs.2020.3042465

  40. Zhang Z, Lu X, Cao G, Yang Y, Jiao L, Liu F (2021) Vit-yolo:transformer-based yolo for object detection. In: 2021 IEEE/CVF international conference on computer vision workshops (ICCVW). https://doi.org/10.1109/iccvw54120.2021.00314

  41. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I (2017) Attention is all you need. Neural Inform Process Syst 30:105

    Google Scholar 

  42. Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Patt Anal Mach Intell 42(8):2011–2023. https://doi.org/10.1109/tpami.2019.2913372

    Article  Google Scholar 

  43. Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. Neural Inform Process Syst 28:501

    Google Scholar 

  44. Nair V, Hinton G (2010) Rectified linear units improve restricted boltzmann machines. International conference on machine learning,international conference on machine learning

  45. Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv: Learning,arXiv: Learning

  46. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. Cornell University - arXiv

Download references

Funding

Funding was provided by National Natural Science Foundation of China, Grant No. 61972417 and Natural Science Foundation of Shandong Province Grant No. ZR2020MF005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shibao Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Li, C., Wang, Z. et al. Pas: a scale-invariant approach to maritime search and rescue object detection using preprocessing and attention scaling. Intel Serv Robotics (2024). https://doi.org/10.1007/s11370-024-00526-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11370-024-00526-5

Keywords

Navigation