Abstract
Responding to a water rescue situation is challenging. First responders need access to data as quickly as possible to increase the likelihood of a successful rescue. Using aerial imagery systems is especially useful in a search and rescue scenario because it provides a higher dimensional view of the search environment. Unmanned aerial vehicles can be easily used to acquire aerial image data. During water-based search and rescue scenarios, first responders sometimes deploy an inflatable marker called a rescue danbuoy. The danbuoy is fitted with a small conical sack known as a drogue, this ensures that the marker is not blown off course by the wind and instead follows the flow of the body of water. Tracking the danbuoy as it moves is of utmost importance in a water rescue. We present a new data-set “VisBuoy” with imagery containing instances of danbuoy markers and boats in real-world water-based settings. We also show how using various deep learning-based computer vision techniques, we can autonomously detect danbuoy instances in aerial imagery. We compare the performance of four state-of-the-art object detectors Faster RCNN Retinanet, Efficientdet and YOLOv5 on the “VisBuoy” data-set, to find the best detector for this task. We then propose a best model with a precision score of 74% which can be used in search and rescue operations to detect inflatable danbuoy markers in water-based settings.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Accurate and timely access to location based insights is key to successful search and rescue (SAR) operations. The most efficient situational awareness is achieved through aerial assessment [7]. Unmanned aerial vehicles (UAVs) are agile, fast and can be programmed to operate autonomously [25]. While aerial data acquisition alone helps obtain a bird’s-eye view during a rescue scenario, it presents a major challenge in processing a large amount of data to identify objects of interest in real-time [17]. Dealing with this data in real-time as a human is non-trivial, however, computer vision based object detection models provide a way to automatically search this data for objects of interest. This could be helpful for SAR first responders who can be guided by a sufficiently accurate algorithm to objects of interest visible in the UAV data.
Object detection is a computer technology related to computer vision that deals with detecting instances of semantic objects of various classes in digital imagery. Computational detection of objects of interest in a SAR mission is useful, it removes the need to manually review large amounts of data and allows for autonomous operations if required. In the recent past, deep-learning based object detection models have risen to prominence due to their higher performance compared to classical computer vision methods. Convolutional neural networks (CNNs) are state-of-the-art for object detection tasks and are used to great effect in many domains such as, medicine, automotive and space.
In this paper, we compare several state-of-the-art object detection models for performance on our novel data-set “VisBuoy”. We use the standardized detection performance metrics mean average precision and mean average recall. We find the most accurate object detector from this set and produce a model which can be used to detect danbuoy inflatable markers in a SAR scenario.
The paper is structured as follows: Sect. 2 details some related work. We outline our research methodologies in Sect. 3. We share the results of our experiments in Sect. 4 and we conclude with a summation of our results.
2 Related Work
Research into the use of UAVs for SAR has been popular in recent years. A number of studies have been conducted in disaster management [6] where UAV technology has been explored across all three disaster stages; pre-disaster preparedness [24], disaster assessment [8] and post-disaster response and recovery [10].
A subset of this research area comprises work on aerial image capture for UAV-assisted SAR missions [13]. Specifically, the task of automated object detection has been explored extensively. Approaches range from classical object detection methods such as edge detection and classification [4], to modern deep learning-based approaches, the latter achieving more accurate detections [2]. This research mainly focuses on the detection of people [5] on land rather than in water-based settings [11]. Our research takes a novel approach, instead detecting danbuoy inflatable markers via aerial imagery in water-based settings during SAR missions.
Many approaches take the route of examining the accuracy of one architecture on a public data-set. There are several drone-specific data-sets such as VisDrone [27] which are commonly used. We create a custom data-set as we are unaware of any publicly available danbuoy data-set at this time. There has been some research into the comparison of multiple state-of-the-art aerial image-based object detectors for vehicle [1] and person [20] detection. Our work focuses on a similar approach i.e. the comparison of multiple detectors in search of the best approach, but on the novel task of danbuoy inflatable marker detection in a water-based environment.
3 Methodology
3.1 Data-Set Generation
We gathered a custom data-set (Table 1) of danbuoy inflatable markers using a DJI Mavic Enterprise drone. We deployed a “Force 4 SOS Inflatable Danbuoy” (Fig. 1) into a river setting (Fig. 2) via a small boat. We captured a video through several UAV fly-overs at various altitudes, angles of approach and speeds resulting in a data-set of instance sizes (Fig. 3). Finally, we split the video into 1,279 frames using video to image conversion software [23] and labelled the images with the label-studio annotation tool [22].
3.2 Model Development
To computationally detect instances of inflatable markers, four CNN based models were trained with an 80/20 train-validation split, on an NVIDIA GeForce RTX 2080 SUPER. Code was written to ensure all approaches could be validated against each other on the mean average precision and mean average recall metrics.
Intersection over union (IoU) (Fig. 4) is an important concept when evaluating the average precision. An IoU of 1 means that the ground truth and predicted bounding box are perfectly overlaid, while an IoU of 0 means the prediction has no overlap with the ground truth. We calculate the average precision (AP) (Eq. 1) by finding the area under the interpolated precision recall curve. Next, we calculate the average recall (AR) (Eq. 2) by finding the area under recall curve at each IoU level. To get the means (MAP and MAR) we average the AP/AR over all classes. For AP50 and AP75 we set and hold the IoU threshold at 50% and 75% respectively.
where \(R_n\) is a unique recall value
\(P_{interp}(R_{n+1})\) is the interpolated precision value
where o is IoU [0.5:1] and recall(o) is the corresponding recall
Four state-of-the-art models were trained with Pytorch [16] as follows: Faster RCNN, Retinanet, Efficientdet and YoloV5. The models were configured as outlined in (Table 2). The learning rate, optimizer, image size, number of epochs and the batch size were kept constant to ensure a fair comparison. Commonly used backbone architectures were used for each model respectively. A short description of each model follows.
Faster RCNN is a two-stage detector [19] which consists of a deep fully convolutional neural network with a region proposal network and a detector that uses these proposals to generate predictions. It can be extended to return segmentation masks by adding another branch [12]. Faster RCNN has slower inference speeds than other detectors due to its large network parameter size.
Retinanet is a single-stage object detector which is widely used in satellite and aerial imagery. This detector was created as a competitor to two-stage detectors e.g. Faster RCNN which generally has higher accuracy at the cost of slower inference speeds. It utilizes a focal loss function [14] designed to focus on hard examples rather than allowing easy examples to skew the detector. The result is a detector which is faster and more accurate than many two-stage detectors.
YoloV5 is another single-stage detector designed for speed and can be optimized end to end due to its single network [18] detection pipeline. It is more prone to localization errors than two-stage detectors but is better at avoiding false detections and importantly it learns very general representations of objects.
Efficientdet is a detector designed for efficiency. It includes a novel bi-directional feature pyramid network (FPN) [21] allowing for feature fusion. It also scales resolution, depth and width for each of the networks (backbone, features, prediction) concurrently. Importantly, it achieves a higher AP on COCO [15] than many other SOTA models despite having (in our experiments) over 90% fewer parameters.
Using Pytorch Lightning Flash [9] we configured the training pipeline (Fig. 5) to ingest a Hydra [26] based configuration object so that we could easily run different models using the same underlying code. We wrote a custom validation loop so that all models could be easily compared under the MAP metric. We also implemented cloud-based logging with weights and biases [3] to ensure data provenance and reproducibility.
4 Evaluation
We trained the models for 50 epochs each and their validation metrics were logged on each epoch (Fig. 6). We evaluated the models under four standard metrics for state-of-the-art object detection, MAP, MAP50, MAP75 and MAR. By keeping some constant configuration values as shown earlier we ensured a fair comparison between the models.
The maximum score for each of the metrics was calculated and the models were ranked based on their performance (Table 3). We found that each model had merits under the various metrics, with three out of four models having a best-in-metric result.
YoloV5 scored best in MAR, though all models were similar under the MAR metric with a standard deviation of 0.007. In SAR scenarios, object detection models should prioritize precision over recall. High precision is a priority in SAR operations due to the possibility of false positive detections impeding the SAR team’s efforts.
Under the MAP50 metric models once again performed similarly. It is easier for the models to be deemed correct when holding the threshold for detection at 50% and so separating the models in terms of performance under this metric was difficult. Retinanet scored best outperforming Efficientdet by 0.89%.
The metrics which proved most useful in separating the models were MAP (all IoU ranges) and MAP75 held at 75% IoU. These metrics had the largest spread of values between each of the models and precision was the metric we prioritized most due to its importance in SAR as mentioned earlier. Efficientdet was the best model under the MAP metric, out-performing Retinanet by 9.58%. Efficientdet was also best under the MAP75 metric with a score 14% higher than the second-best model Retinanet.
As mentioned previously high precision is important in SAR operations to best assist the first-response team, as such, based on our evaluations we recommend Efficientdet be used for its high precision on the “VisBuoy” dataset. Some other factors in favour of Efficientdet include its lower power usage (Fig. 7) during training and the second highest inference (Fig. 8) speed compared to other models.
5 Conclusion
In this paper we compare the performance of four state-of-the art object detection models on a data-set of danbuoy inflatable markers for water-based search and rescue scenarios. The data-set consisted of 1,279 images with 532 instances of danbuoys and 387 instances of boats.
Our analysis involved keeping some core hyper-parameters constant (learning rate, optimizer, image size, epochs and batch size) to allow for a fair comparison across all detectors. We rank the detectors based on their mean average precision and mean average recall in accordance with the standard object detection evaluation process. We rank each model in order of highest performance on our data-set as Efficientdet, Retinanet, YoloV5 and Faster RCNN.
As such, we recommend Efficientdet with a MAP75 score of 74% as the best model for detecting danbuoy inflatable markers from aerial imagery during SAR operations. Efficientdet has the added benefits of consuming less power while training and having the second fastest inference speed of all the models. We believe there are further improvements possible in future work for the Efficientdet model by exploring different combinations of the core hyper-parameter constants and varying the backbone.
UAV technology is already helpful in SAR efforts, providing a birds-eye view during operations. Extending this technology to add automated processing of the large amounts of data generated and providing precise location-based information to identify objects of interest in real-time is beneficial. Our research suggests Efficientdet as the best-in-class detection model to use for danbuoy inflatable marker detection in water-based SAR.
References
Acatay, O., Sommer, L., Schumann, A., Beyerer, J.: Comprehensive evaluation of deep learning based detection methods for vehicle detection in aerial imagery. In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (2018). https://doi.org/10.1109/avss.2018.8639127
Akshatha, K.R., Karunakar, A.K., Shenoy, S.B., Pai, A.K., Nagaraj, N.H., Rohatgi, S.S.: Human detection in aerial thermal images using faster R-CNN and SSD algorithms. Electronics 11, 1151 (2022). https://doi.org/10.3390/electronics11071151
Biewald, L.: Experiment tracking with weights and biases (2020). https://www.wandb.com/
Doherty, P., Rudol, P.: A UAV search and rescue scenario with human body detection and geolocalization. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 1–13. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76928-6_1
Dousai, N.M.K., Lončarić, S.: Detection of humans in drone images for search and rescue operations. APIT (2021). https://doi.org/10.1145/3449365.3449377
Erdelj, M., Natalizio, E.: UAV-assisted disaster management: applications and open issues. Institute of Electrical and Electronics Engineers Inc. (2016). https://doi.org/10.1109/ICCNC.2016.7440563
Erdelj, M., Natalizio, E., Chowdhury, K.R., Akyildiz, I.F.: Help from the sky: leveraging UAVs for disaster management. IEEE Pervasive Comput. (2017). https://doi.org/10.1109/mprv.2017.11
Ezequiel, C.A.F., et al.: UAV aerial imaging applications for post-disaster assessment, environmental management and infrastructure development, pp. 274–283. IEEE Computer Society (2014). https://doi.org/10.1109/ICUAS.2014.6842266
Falcon: PyTorch lightning (2022). https://github.com/PytorchLightning/pytorch-lightning
Felice, M.D., Trotta, A., Bedogni, L., Chowdhury, K.R., Bononi, L.: Self-organizing aerial mesh networks for emergency communication, vol. 2014-June, pp. 1631–1636. Institute of Electrical and Electronics Engineers Inc. (2014). https://doi.org/10.1109/PIMRC.2014.7136429
Goodrich, M.A., et al.: Supporting wilderness search and rescue using a camera-equipped mini UAV. J. Field Robot. 25, 89–110 (2008). https://doi.org/10.1002/rob.20226
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017). https://doi.org/10.1109/ICCV.2017.322
Kruijff, G.J.M., et al.: Rescue robots at earthquake-hit Mirandola, Italy: a field report (2012). https://doi.org/10.1109/SSRR.2012.6523866
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017). https://doi.org/10.1109/ICCV.2017.324
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Pi, Y., Nath, N.D., Behzadan, A.H.: Convolutional neural networks for object detection in aerial imagery for disaster response and recovery. Adv. Eng. Inform. 43, 101009 (2020). https://doi.org/10.1016/j.aei.2019.101009
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection, vol. 2016-December, pp. 779–788. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.91
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks, vol. 28. Curran Associates, Inc. (2015). https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf
Sambolek, S., Ivašić-Kos, M.: Automatic person detection in search and rescue operations using deep CNN detectors. IEEE Access (2021). https://doi.org/10.1109/access.2021.3063681
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. pp. 10778–10787. IEEE Computer Society (2020). https://doi.org/10.1109/CVPR42600.2020.01079
Tkachenko, M., Malyuk, M., Holmanyuk, A., Liubimov, N.: Label studio: data labeling software (2020–2022). https://github.com/heartexlabs/label-studio
Tomar, S.: Converting video formats with FFmpeg. Linux J. 2006, 10 (2006)
Ueyama, J., et al.: Exploiting the use of unmanned aerial vehicles to provide resilience in wireless sensor networks. IEEE Commun. Mag. 52, 81–87 (2014). https://doi.org/10.1109/MCOM.2014.6979956
Waharte, S., Trigoni, N.: Supporting search and rescue operations with UAVs, pp. 142–147 (2010). https://doi.org/10.1109/EST.2010.31
Yadan, O.: Hydra - a framework for elegantly configuring complex applications (2019). https://github.com/facebookresearch/hydra
Zhu, P., et al.: Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7380–7399 (2021). https://doi.org/10.1109/TPAMI.2021.3119563
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this paper
Cite this paper
Mulcahy, E., Van de Ven, P., Nelson, J. (2023). Aerial Object Detection for Water-Based Search & Rescue. In: Longo, L., O’Reilly, R. (eds) Artificial Intelligence and Cognitive Science. AICS 2022. Communications in Computer and Information Science, vol 1662. Springer, Cham. https://doi.org/10.1007/978-3-031-26438-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-26438-2_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26437-5
Online ISBN: 978-3-031-26438-2
eBook Packages: Computer ScienceComputer Science (R0)