Oil Spill Identification based on Dual Attention UNet Model Using Synthetic Aperture Radar Images

Oil spills cause tremendous damage to marine, coastal environments, and ecosystems. Previous deep learning-based studies have addressed the task of detecting oil spills as a semantic segmentation problem. However, further improvement is still required to address the noisy nature of the Synthetic Aperture Radar (SAR) imagery problem, which limits segmentation performance. In this study, a new deep learning model based on the Dual Attention Model (DAM) is developed to automatically detect oil spills in a water body. We enhanced a conventional UNet segmentation network by integrating a dual attention model DAM to selectively highlight the relevant and discriminative global and local characteristics of oil spills in SAR imagery. DAM is composed of a Channel Attention Map and a Position Attention Map which are stacked in the decoder network of UNet. The proposed DAM-UNet is compared with four baselines, namely fully convolutional network, PSPNet, LinkNet, and traditional UNet. The proposed DAM-UNet outperforms the four baselines, as demonstrated empirically. Moreover, the EG-Oil Spill dataset includes a large set of SAR images with 3000 image pairs. The obtained overall accuracy of the proposed method increased by 3.2% and reaches 94.2% compared with that of the traditional UNet. The study opens new development ideas for integrating attention modules into other deep learning tasks, including machine translation, image-based analysis, action recognition, and speech recognition.


Introduction
The frequent oil discharge from ships or oil platforms has become a major threat to our coastal ecosystem and may generate large economic losses for maritime activities interrupted by this pollution (Migliaccio et al., 2007). To manage and minimize oil spills, they should be first identified. Satellite imaging can be an efficient tool for this purpose. Remote sensing imageries, either optical or radar, have been extensively utilized in oil spill detection.
Various studies have documented the importance of SAR in oil spill detection tasks (Cantorna et al., 2019).
Polarimetric SAR offers a massive range of features that effectively enhance oil spill extraction and detection . Radar systems mounted on aircraft and satellites provide images of sea and land surfaces (Bovenga, 2020). The SAR sensor sends out radio waves, which are reflected off the surfaces and used to create a visual interpretation of the target surface. Numerous approaches have been introduced in the literature to automatically identify oil spills; however, the SAR image feature extraction through the traditional approach has become a major drawback limiting the performance (Al-Ruzouq et al., 2020;Kolokoussis & Karathanassi, 2018). The feature extraction becomes increasingly cumbersome as the number of classes to classify increases. Expert judgment, along with several trial-and-error processes, decides which features best describe different object classes. Moreover, each feature definition requires dealing with a plethora of parameters, which must be fine-tuned.
Deep Convolutional Neural Networks (DCNNs) have impressively boosted the performance in many fields, such as: change detection , super resolution (Moustafa & Sayed, 2021), hazard assessment, and object detection (Mahmoud et al., 2020). Because of the ability of recent architectures to explore significant multilevel deep features, Convolutional Neural Networks (CNNs) have been incorporated to efficiently solve complex functions (Chen et al., 2017). Attention mechanisms have recently become one of the most important concepts in the deep learning field. Attentions are typically inspired by the biological systems of humans, which tend to focus on the distinctive parts when processing large amounts of information. In diverse CNN architectures, several attention mechanisms have been widely used to improve the representative ability of these architectures by strengthening feature extraction. Examples of widely used attention mechanisms include channel-wise attention (Zhu et al., 2019), squeeze and excitation attention (Hu et al., 2018), and the pyramid attention model (Mei et al., 2020), among many others.
This study proposes a novel approach for identifying and segmenting oil spills based on the residual UNet model and attention learning for SAR satellite imageries. It incorporates (DAM) (Fu et al. 2019) which is composed of channel-wise attention (Zhu et al., 2019) and position attention  to amplify the useful features to improve the oil detection accuracy. Both attention models learn the channel interdependencies and spatial interrelations of features that allow the proposed model to selectively emphasize the informative and discriminative features of an oil spill. The proposed approach accurately detects oil spill pixels with 94.2% accuracy, 89% precision, 88.4% recall, and 86.3% F1 score. It provides an improved accuracy compared with other architectures. The major contributions of this study are as follows: • The DAM block is plugged into the UNet architecture to identify oil spills using SAR images. • A weighted cross-entropy loss function for handling the imbalanced distribution of the oil spill dataset is used. • A newly collected SAR oil spill dataset, called EG-OilSpill, is used to test our technique against other baseline methods, including the traditional UNet without the DAM.
The remainder of this paper is structured as follows: Sect. 2 reviews the related published literature on deep learning for oil spills; Sect. 3 describes the suggested technique for identifying oil spills using satellite images; Sect. 4 discusses the experiment design and findings; and Sect. 5 draws the conclusions.

Related Work
Several attempts have been conducted to boost the CNN performance in the oil spill identification problem (Krestenitis et al., 2019a(Krestenitis et al., , 2019bLi et al., 2022;Rousso et al., 2022). However, a limited SAR-labeled dataset means that data scarcity is when there is a limited amount, or a complete lack of labeled training data, that limits the performance and the generalization of a sophisticated deep learning framework. Attention mechanism has recently become a popular strategy to boost performance. This section reviews recent studies on CNN on oil spills, formulates knowledge distillation, and presents the recent work to attention modules.

Neural Networks for Oil Spill Identification
Several works have adopted semantic segmentation using deep CNNs to detect oil slicks on the sea surface. Object detection, which is a subset of computer vision, is an automated method for locating essential objects in an image with respect to the background (Fang et al., 2020). Object detection has been adopted in many real-life applications, such as in human-computer interaction (Singh & Singh, 2018), robotics (Hwang et al., 2019), consumer electronics (e.g., smartphones) , image retrieval (Moustafa et al., 2020), and transportation (e.g., autonomous and assisted driving) (Feng et al., 2020). Deep learning is a state-of-the-art method used in recent object detection studies (Sharma & Mir, 2020).
In recent years, various deep learning architectures have been introduced for oil spill detection (Table 1). Ref. (Krestenitis et al., 2019b) evaluated various DCNN architectures, namely: UNet, LinkNet, Pyramid Scene Parsing Network (PSPNet), DeepLabv2, and DeepLab-v3?, for oil spill detection and concluded that DeepLabv3? yields the best results in terms of the test set accuracy and associated inference time in oil spill detection. Song et al. (2020) designed a deep CNN appropriate for oil spill detection to automatically extract deep features from the PolSAR data. Meanwhile, Guo et al. (2018) made a comparison between classical and deep learning methods and concluded that deep learning methods (e.g., SegNet and fully convolutional network [FCN]), which use semantic segmentation for oil spill detection, outperform other classical methods such as support vector machine and random forest. Yan et al. (2019) proposed a multifunction fusion neural network to detect oil spills as ocean phenomena. This proposed method achieved the highest detection accuracy. DCNN was built to analyze the SAR Dark Patch Classification in oil spill detection. The proposed method reported a higher accuracy in detecting oil spills and lookalikes (Zeng & Wang, 2020) by establishing the best initial values of the wavelet neural network (WNN) for the oil spill classification, Song et al. (2017), in their experimental results, showed that the optimized WNN classification will largely improve the ocean oil spill classification. In Cantorna et al. (2019), CNN was used to perform automated oil spill detection aside from classical segmentation methods. Various recurrent neural networks (RNNs) were implemented and tested on Side-Looking Airborne Radar (SLAR) images for candidate oil spill detection, and these RNNs achieved higher accuracy (Alacid et al., 2017). Deep learning algorithms, such as sparse autoencoders and deep belief networks, were used for oil spill segmentation, outperforming other traditional methods (Chen et al., 2017). DCNN models can automate contaminated area detection and achieve a higher performance (Krestenitis et al., 2019a). In this study, we propose SAR image recognition-based on CNN and report a higher accuracy for oil spill tracking (Xiong & Zhou, 2019).

Attention Modules
Attention is a complicated and essential cognitive function for humans (Corbetta & Shulman, 2002). One important feature of human perception is not being able to process all information at once. For example, people do not usually see all scenes from start to finish when they perceive things visually; instead, they observe and pay attention to certain parts as required. This technique helps in selecting highvalue information with relevance to the limited available processing resources. An attention mechanism significantly increases the efficiency and precision of information processing. This mechanism may be used as a resource allocation that is the principal means for resolving information overload. With limited computing power, information with limited computational resources can be processed more efficiently. Therefore, some researchers are concerned with the area of computer vision. An attention mechanism can also be used to explain incomprehensible neural architecture behavior and performance improvements. Despite boosting the performance of the neural network in many areas (e.g., financial , material (Ieracitano et al., 2020), meteorology (Yu et al., 2015), medical (Liu et al., 2018), and autonomous driving (Ming et al., 2021)), interpretability has remained a major problem. Whether or not the attention mechanism can be effectively used to explain a deep network remains a topic of dispute (Li et al., 2022;Fu et al. 2019).

The Proposed Method
This section presents a detailed description of the proposed hybrid attention model for oil spills in SAR images. Figure 1 shows a graphical representation of the overall structure of the proposed framework. The proposed approach typically alters the traditional UNet architecture by incorporating dual attention to effectively suppress the information flow and minimize false identifications (Fig. 1a). The advantages of (PAM) and (CAM) are combined to gather contextual information better than the original UNet (Fig. 1b). We also included a tailored loss function.

Traditional UNet
The traditional UNet (Ronneberger et al., 2015) is an FCN architecture extension initially proposed in 2015 for biomedical image semantic segmentation. It is now widely used in various applications. UNet has two blocks: an encoder and a decoder. The term ''U'' basically represents the symmetric between the encoder and decoder blocks. The encoder block aims to capture an image's meaningful feature map, whereas the decoder network upsamples the extracted feature map while decreasing its filters. The original UNet architecture includes four stages in both the encoder and decoder blocks. In each encoder stage, two 3 9 3 convolutions are applied repeatedly, and after each one, a rectified linear unit (ReLU) and a 2 9 2 max pooling operation with stride 2 are applied for downsampling. Every step in the decoder path consists of an upsampling of the feature map followed by a 2 9 2 convolution that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the encoding path, and two 3 9 3 convolutions, Fig. 1 Graphical representation of the proposed dual attention model for oil spills in the SAR image approach each followed by a ReLU (Ronneberger et al., 2015;Lou et al., 2021). To improve the traditional UNet model, we incorporate the DAM to suppress the feature activations from irrelevant backgrounds. This will be discussed in detail in the next section.

Dual Attention Model
Both PAM and CAM have integrated to capture long-range contextual information in the spatial and channel dimensions (Fu et al. 2019). The DAM that incorporates global spatial and channel dimension interdependence could boost the oil identification accuracy compared with the original UNet. Figure 2 presents the details of the DAM. Figure 2a illustrates the PAM architecture in detail. The local input feature A [ R CxHxW was fed into three convolution layers to generate three new feature maps (i.e., B, C, and D), where B; C; D f g2 R CxHxW . Next, B, C, and D were reshaped to R CxN , where N 2 H Â W. A matrix multiplication was implemented between B and the transpose of C. A SoftMax layer was then applied to calculate the spatial attention map S 2 R NxN , as shown in Eq. (1)

Position Attention Map
where ith position affects the jth position (Fu et al. 2019). Matrix multiplication was performed between a D and the transpose of S. The result was then reshaped to R CxHxW and multiplied by a scaler a. Finally, the element-wise sum operation was applied to obtain the final output E 2 R CxHxW , as shown in [Eq. (2)].: where a tends to start as 0 and gradually allocated with more weight. Figure 2b illustrates the CAM architecture in detail. Highlevel features were regarded as class-specific responses. The different semantic responses were associated with each other. Each channel mapped high-level features that were considered a class response, and distinct semantic responses were linked. By utilizing the interdependencies between the channel maps, interdependent maps may be emphasized, and specific semantic features may be improved. Therefore, we constructed herein a module for channel attention to explicitly model the interdependencies between channels. We directly calculated the channel attention map X R CxC from the original feature map A R CxN . Finally, a SoftMax layer was applied to obtain the channel attention map X R CxC , as shown in Eq. (3):

Channel Attention Map
where x ji measures the ith channel's impact on the jth channel. Scale parameter b was initialized as 0 and gradually learned to assign more weight. The final output feature map E was calculated as follows: Finally, the outputs of both attention maps were combined using one of the four approaches of addition, maxout, multiplication, and concatenation. In this study, we adopted the addition operation and used

Loss Function
In the training phase, the loss function was used to guide the network to learn meaningful predictions close to the ground truth in terms of segmentation metrics (Ma et al., 2021). By measuring the dissimilarity between the ground truth and the predicted segmentation, loss functions play a critical role in the CNN based on segmentation methods. A weighted binary cross-entropy loss function was adopted for the imbalance dataset during the training of the proposed approach and influenced the performance. Generally, an imbalance can occur in two ways: imbalance from front to background and imbalance from front to object. In our case, the water class was over 70% of the training dataset, whereas oil represented approximately 30%. Cross-entropy loss (CE) (Yi-de et al., 2004) is defined as the difference between two probability distributions for a given random variable or a sequence of occurrences specified as a measurement. It is often used for classification purposes and segmentation. Binary cross-entropy loss (BCE) is defined as in Eq. (5): where y is the ground truth and ŷ is the predicted value. The weighted binary cross-entropy loss (WCE) (Pihur et al., 2007) is a version of the binary cross-entropy commonly adopted in skewed data situations. The WCE is defined as in Eq. (6): where the b value is used to tune false negatives and false positives.

Experimental Results
Section 4.1 presents a brief description of the datasets. Section 4.2 describes the environmental setup. Section 4.3 illustrates the evaluation metrics. Section 4.4 discusses the experimental results and findings.

Datasets
The oil spill incident considered in this study occurred off the coast of Saudi Arabia, approximately 96 km from the Saudi coast of Jeddah as shown in (Fig. 3) Sentinel-1 images were collected between October 13 and 25, 2019, with a 20-m resolution. Accordingly, 5-m azimuth resolution, VV polarization, and Universal Transverse Mercator zone-37 North projection were collected by the European Space Agency (ESA). The imagery dates were selected on the basis of SAR data availability and by considering the limited time lapse after the incident. The SAR images from the Sentinel-1 satellites of the ESA were used to evaluate the algorithms and the specifications of the GRD Sentinel-1A data are presented in (Table 2). The case considered herein dealt with C-band radar images with a VV polarization that is considered appropriate for oil spill detection (Cantorna et al., 2019). Filipponi (2019) illustrates The preprocessing of Sentinel-1A as shown in (Fig. 4) which involves four phases: (1) extraction of the amplitude VV polarization, (2) application of radiometric correction on the amplitude VV polarization, (3) application of Lee speckle filtering, and (4) finally, step of SAR preprocessing, Geometric Terrain Correction is performed in order to compensate for geometric distortions. This step assures that the geometry of the layer will match or closely resemble the geometry of the physical world (Christiansen et al., 2018;Schubert et al., 2015). The DEM parameter of the Range Doppler terrain correction was set to ''SRTM 3Sec.'' and chosen, and the DEM Resampling technique and Image Resampling method were set to Bilinear Inter-polation (Arif & Akbar, 2005;Madaan & Kaur, 2019). A pixel spacing of 20 m was selected. The SAR data used were compiled from a dataset, called the EG-OilSpill dataset. This dataset starts with 440 images with size 256 9 256 and is annotated at the pixel level containing two classes (i.e., water and oil). The dataset contained a total of 3000 pairs of images. Statistically, almost 70% of the dataset represents water bodies, whereas the remaining 30% represents oil spills at the pixel level. Table 3 presents the hardware and software equipment used to train the proposed oil spill identification approach which uses adam optimizer (Li et al., 2022). Training, validating, and test splitting (80/10/10) are used, respectively (Shaban et al., 2021). Early stopping for regularization to solve the problem of overfitting is adopted.

Evaluation Metrics
For the architecture evaluation, we used various evaluation metrics commonly used in object detection problems as shown in Table 4. The metrics of recall, precision, F1 score (Ozigis et al., 2019), and overall accuracy (OA) are presented in Eqs. (7)-(10), respectively , where TP is true positive, FN is false negative, FP is false positive, and TN is true negative.

Results and Findings
We conducted several experiments to evaluate the proposed DAM-UNet for the oil spill detection task. We also compared the performance of the proposed approach with those of the four baseline methods, namely, FCN (Long et al., 2015), traditional UNet (Ronneberger et al., 2015), LinkNet (Chaurasia & Culurciello, 2017), and multiscale pyramid network-based model (PSPNet) (Zhao et al., 2017). Table 5 compares and summarizes the architectures, backbones, and loss functions of each method.  Table 6 illustrates the obtained quantitative comparison results between the four baseline methods in terms of the four metrics: OA, recall, precision, and F1 score using cross-entropy. The proposed DAM-UNet outperformed the other methods in terms of the OA (93.7%), precision (90.8%) (Shaban et al., 2021;Yekeen & Balogun, 2020), and F1 score (85.9%). PSPNet (92.8%) and FCN (92.3%) showed slightly lower OA scores than UNet (91%). The F1 score, precision, and recall illustrated the same pattern. The PSPNet performance in terms of recall achieved the highest value of 83.2%. Figure 5 presents a comparison of the visual results of the aforementioned models (Alpers et al., 2017). The proposed Dual-UNet model achieved the best identification accuracy for both water bodies and oil spill regions with 93.7% OA. By contrast, the LinkNet model yielded the lowest accuracy with misclassified oil spill regions. The UNet model identified the sharp edges of the oil spill regions, but several pixels were misclassified with the water bodies. Both PSPNET and FCN showed approximately the same performance (i.e., 92.8-92.3%, respectively). Figure 6 depicts the selected samples of the EG-Oil Spill dataset using the proposed DAM-UNet.
We conducted various experiments to evaluate the impact of different loss functions, i.e., WCE, focal loss [52], and BCE [53], on the performance of the proposed DAM-UNet. Table 7 presents the obtained results in terms of the mIOU and OA for the varied loss function. The weighted binary cross-entropy achieved the best results in terms of the OA, depicting an increase of * 0.5%. The focal loss showed the worst performance with 92% OA. For the intersection over union (IoU), the proposed DAM-UNet adopting the weighted binary loss (WCE) achieved higher accuracy in the water bodies and oil spill classes than the adopted focal loss and BCE.
The above-mentioned results imply that the DAM-UNet-WCE model attained the highest overall mIoU of 83.85%. For the ''oil spill'' class, the class of highest interest, which achieved an accuracy of 75.70% was higher than the maximum value reported by DAM-UNet, which adopted focal loss and BCE. The lowest performance in terms of the mIoU was equal to 81.55% and was reported using the DAM-UNet-focal loss method. In summary, utilizing the WCE loss in the proposed DAM-UNet   The highest performance of each method in each metric is highlighted in bold enhances the overall accuracy compared with the traditional binary cross-entropy. The Benchmark Oil Spill Dataset comprising a collection of satellite SAR images of oil-polluted areas obtained via the ESA database contains a set of 1112 images (Krestenitis et al., 2019a, b). The oil pollution records covered the period from September 28, 2015, to October 31, 2017. The SAR images were acquired from the Sentinel-1 European Satellite missions. Figure 7 presents data samples. To ensure data validity and inclusion of oil spills in the images, the European Maritime Safety Agency confirmed the oil spill events through the Cleanse Net service along with their geographic coordinates. Table 8 shows the quantitative comparison results obtained between the four baseline methods in terms of the four metrics. The DAM-UNet surpassed all the other methods and achieved higher scores in the OA (93.5%), precision (88.0%), and F1 score (83.8%) compared with the UNet method, which attained 92.8% accuracy. The FCN, PSPNet, and LinkNet have lower OA scores of 92%, 92.2%, and 92.4%, respectively. For the other evaluation metrics, PSPNet achieved the highest recall score of 82%. Various experiments were implemented to evaluate the impact of different loss functions (i.e., WCE, focal loss, and BCE) on the performance of the proposed DAM-UNet. Table 9 lists the obtained results in terms of the OA and the intersection over union (%). Overall, all loss functions showed acceptable OA scores. The WCE achieved the best results for the OA metrics, showing an increase of * 0.3%. BCE and focal loss shared a similar pattern for all metrics. The BCE loss exhibited the worst performance with 92.7% OA. Using the WCE in the proposed DAM-UNet improved the overall accuracy compared with the conventional binary cross-entropy.
IoU has utilized (Table 9) for the water bodies and oil spill classes separately for the proposed DAM-UNet using different loss methods. Compared with DAM-UNet, which adopted focal loss and BCE, the proposed method that adopted the weighted binary loss (WCE) achieved a higher accuracy with an OA score of 93.80% and mIoU of 82.85% in both classes. For the oil spill class considered as the class of highest interest, the achieved accuracy of 73.70% was higher than the reported value by the DAM-UNet, which adopted focal loss and BCE. The lowest performance in terms of the mIoU was equal to 81% and achieved by the DAM-UNet-adopting focal loss function. In summary, utilizing the WCE in the proposed DAM-UNet enhances the overall accuracy compared with the traditional binary cross-entropy.

Conclusion
Oil spills are one of the most severe threats to our marine and coastal environment. Therefore, effective monitoring and early warning are essential in facing the danger and reducing environmental damage. SAR sensors can offer high-resolution images of areas, where possible oil spills may be detected. Remote sensing via SAR sensors plays an important role in achieving this goal. To automatically interpret SAR images and distinguish oil spill objects, we introduce herein a semantic segmentation method based on UNet and DAM, called DAM-UNet. The proposed approach takes advantage of two attention models (i.e., the channel attention map and position attention map) to learn the spatial interrelations of features and the selective global and local information to emphasize the informative and discriminative features of an oil spill. We adopted the weighted cross-entropy as the loss function to address the imbalance dataset problem. We also introduced a largescale oil spill dataset, called the EG-OilSpill dataset. The obtained results highlighted the effectiveness of the proposed method in qualitatively and quantitatively detecting oil spills using the EG-Oil Spill and Benchmark Oil Spill datasets. In the future, accurate models trained on the generated dataset can be contained in a broader framework for the identification of oil spills and the decision making required to deal with them. In conclusion, segmentation techniques may be applied to other research areas that utilize remote sensing, such as fire detection or floods.  The highest performance of each method in each metric is highlighted in bold  The highest performance of each method in each metric is highlighted in bold  The highest performance of each method in each metric is highlighted in bold

Declarations
Conflict of interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.