1 Introduction

The coastal zone is the intermediate area between land and sea. This area, although relatively small (less than 15% of the Earth’s surface), is of great economic significance because it is inhabited by 60% of the global population [1]. Coastal areas are the place of intensive human activity (e.g. construction of ports, settlements, roads, development of industry, dredging, as well as the extraction of sea sand for construction purposes) and of natural phenomena (e.g. coastal erosion or deposition, flooding). Both natural factors and human activity cause constant changes of the shoreline of coasts. This is why accurate and timely acquisition of information on the current state of coast shorelines and the study of their spatio-temporal changes are of great importance for environmental protection and allow coastal zones to be properly managed.

There are many shoreline definitions in the literature [2,3,4,5,6,7]. A very basic definition says that a shoreline is the physical boundary between the land and the water [2, 3]. In this literature review, the terms “shoreline” and “coastline” are considered synonymous and no difference is made between them.

Fig. 1
figure 1

Detection of coastlines in the coastal zone of the Aliağa Bay in Turkey (the Mediterranean Sea, near the port city of Aliağa in the Aegean Region of Turkey) in an example SAR image, provided courtesy of Capella Space Corporation, All Rights Reserved. a Google Earth view. b Source SAR image. c SAR image segmentation result as a binary image. d Coastlines obtained from the segmentation performed

Fig. 2
figure 2

Detection of coastlines in the coastal zone of the city of Venice, Italy (located in north-eastern Italy on the Adriatic Sea, the capital of the Veneto region) in an example SAR image, provided courtesy of Capella Space Corporation, All Rights Reserved. a Google Earth view. b Source SAR image. The red circle marks the bridge (Ponte dell’Accademia) over the Grand Canal up to which the shoreline was determined. c SAR image segmentation result as a binary image. d Coastlines obtained from the segmentation performed

Fig. 3
figure 3

Detection of the coastline in the coastal zone of the city of Hakodate in Japan, in the vicinity of an airport (the city lies in the Oshima Subprefecture in the south of the island of Hokkaido), provided courtesy of Capella Space Corporation, All Rights Reserved. a Google Earth view. b Source SAR image. c SAR image segmentation result as a binary image. d Coastlines obtained from the segmentation performed

Fig. 4
figure 4

Diagram of the PRISMA search process, including the criteria for the inclusion and exclusion of research papers

In remote sensing, water and land areas had traditionally been analysed using images acquired by optical and infrared technologies. Optical images are easy to analyse and access. What is more, the areas of land and water can be easily mapped in the infrared because water absorbs infrared radiation, while land reflects it strongly. However, tracing coastlines based on images from optical and infrared instruments is significantly hindered by the impact of clouds, solar illumination and other unfavourable weather conditions. Example optical images are shown in Figs. 1a, 2a and 3a.

Synthetic aperture radar sensors enable the acquisition of high-resolution images during the day and night, in almost any weather conditions. This is why the use of SAR images for Earth observation is of increasing interest, which is also boosted by their increasing availability. SAR images are obtained by emitting microwave signals from the sensor, which are then received back or backscattered from the Earth’s surface [8, 9]. SAR images are rendered in grey-levels or at least in monochromatic intensity variations. Specific polarization techniques can also be used in SAR imaging by applying variable polarization antennas [10,11,12]. Multiple polarizations and wavelength combinations can provide different and also complementary information about the surface. As a result, polarimetric synthetic aperture radar images can present much more information than monopolarized SAR images [13,14,15].

Figures 1b, 2b and 3b show three sample SAR images representing different geographical areas. Figures 1c, 2c and 3c show segmentation results in the form of binary images in which the coastline separates white pixels (denoting land and man-made structures) from black pixels (denoting water). Figures 1d, 2d and 3d show shorelines obtained on the basis of the segmentations performed. Binary images were produced using ready-made tools in the Matlab R2022a environment (MathWorks, Inc., Natick, MA, USA) for Windows 11, which simplify manual segmentation and are available in the Image Processing Toolbox. Edges were extracted from binary images using the method presented in [16]. SAR images used for this research were provided thanks to the courtesy of Capella Space Corporation [17].

Examples from Fig. 1a and b, showing the coastal zone of the Aliağa Bay in Turkey, contain clearly visible man-made infrastructure (including: piers, jetties, marinas and fishing ports, concrete quays), which is also included in the identified shoreline, as presented in Fig. 1c and d. This infrastructure is particularly visible in the left parts of Fig. 1a and b. What should also be noted about the example from Fig. 1a are the clouds visible against the background of the sea bay, as well as some errors in image acquisition with optical equipment or an error made by the supporting software which combined image fragments to produce the view in the Google Earth geobrowser. A diagonal line separating the uneven background, i.e. clouds and water, and disappearing at the junction of the shoreline is visible. No such problem occurs in the SAR image shown in Fig. 1b.

In the left part of images from Fig. 2a and b, which portray the coastal zone of the city of Venice in Italy (between the islands of Giudecca and Murano), one can see, among others, port docks and quays, which also form a part of the shoreline identified and extracted in Fig. 2d. Preparing the right binary masks is very important, especially when man-made infrastructure has to be considered and if there are numerous small details, like islands and canals. It is worth mentioning that Venice is a city built on 118 small islands that are separated by numerous canals and connected by more than 400 bridges. This is why preparing the appropriate binary masks for the coastal zone of an urbanized area containing a lot of details can be a major challenge. The question that needs answering is how accurate the binary mask should be, what man-made infrastructure facilities should be included and in what way. A separate issue is whether the available SAR satellite images and optical images are of a sufficient resolution to include numerous small details, like those in the coastal zone of Venice. The binary mask shown in Fig. 2c was prepared under the assumption of determining only the outer shoreline, so the course of all the canals with individual bridges was not included. Only in the case of the largest canal, i.e. the Grand Canal, was the shoreline identified up to the first bridge located in its southern part. In Fig. 2b, a circle marks the bridge (Ponte dell’Accademia) over the Grand Canal up to which the shoreline was marked.

The images in Fig. 3a and b show the coastal area of the city of Hakodate, Japan. Compared to the previous examples, man-made development along the shoreline is the least visible here.

Manual segmentation of shorelines in remote sensing images can be a very tedious and, unfortunately, time-consuming task, especially considering the large amount of data acquired. Moreover, the results of manual segmentation can vary greatly due to the different experience and skills of experts. This is why computer methods applied in specialized software, which enable automatic or semi-automatic image processing, can greatly support the work of experts. There is an increasing number of research papers on this subject in the literature. However, in order to have a good, efficient segmentation method that would enable detecting coastlines, it is necessary to evaluate the existing approaches and determine their advantages and disadvantages in order to propose new, better solutions. Consequently, new review papers are needed to help analyse the current state of knowledge and research results in detail.

In the past, coastlines were mainly detected in optical images [18]. However, as the SAR technology and segmentation methods are constantly developing, there are more and more publications in the literature devoted to detecting and extracting coastlines in SAR images. This literature review focuses on research papers from the last twenty years. The contributions of this work are summarized as follows:

  • This literature review is an attempt to summarize previous research, published in the last twenty years, dealing with segmentation methods used to detect coastlines in SAR images.

  • Presenting a classification and a description of the segmentation methods used, as well as their practical properties.

  • Presenting the advantages and disadvantages of the segmentation methods used and the research methodology proposed in individual works.

  • Describing the metrics used when evaluating segmentation results.

  • Summarising the results obtained for individual groups of segmentation methods, including information about sensors/satellites, as well as the datasets employed and the number of images used during tests.

  • Summarising the feasibility of using existing and new approaches.

  • Indicating promising research directions.

  • Demonstrating the prospects and open research challenges concerning the effective detection and extraction of coastlines in SAR images.

The rest of the paper is organized as described below. Section 2 describes the methodology of literature review adopted, and Sect. 3 presents research papers in which literature reviews related to the subject of this work were carried out. Section 4 contains an introduction to the SAR technology. Section 5 illustrates the division of segmentation methods into specific groups. Section 6 presents the metrics used in research projects to assess segmentation results. Then, the following four sections present state-of-the-art solutions based on thresholding methods (Sect. 7), active contour methods (Sect. 8), machine learning approaches (Sect. 9) and other segmentation methods (Sect. 10). The last section contains a discussion that summarizes the results and presents the conclusions.

2 Review Methodology

This systematic review of literature was carried out using the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) [19] for articles published in 2002-2022. In order to find articles in academic databases, titles and abstracts were searched using different combinations of the following keywords: “coastline”, “shoreline”, “segmentation”, “extraction”, “detection”, “SAR”, “synthetic aperture radar”. Four electronic databases were searched, namely: Google Scholar, Science Direct, IEEE Xplore and Scopus, to find the defined keywords. Altogether 6093 results were obtained, with the following numbers of articles in individual databases: 5620 in Google Scholar, 270 in Science Direct, 33 in IEEE Xplore and 170 in Scopus. Some articles were excluded for the following reasons: (a) articles published before 2002, (b) duplicate records, (c) the presented methods of detecting and extracting coastlines do not use SAR images, (d) no segmentation methods, (e) no comparison to other segmentation methods from literature, (f) no experiments on, and tests of the proposed solutions have been carried out, (g) articles not written in English. Articles filtered using the initial inclusion/exclusion criteria were then screened in two stages: (a) title and abstract screening, (b) full-text assessment. As a result, 32 papers were selected for this literature review. Details regarding the inclusion/exclusion of research papers are presented in Fig. 4.

3 Related Literature Reviews

The literature includes papers in which authors summarize results of research on the detection and extraction of coastlines on the basis of data obtained using remote satellite sensing and geographic information systems (GIS). Review papers on similar topics are discussed in this section.

[20] discusses in detail the subject of changes to coastlines taking into account two techniques of their delineation, namely an automatic and a manual one. The authors also analysed the coastal vulnerability index (CVI). Their research covered various geographic locations. Yasir et al. [21] conducted a review of papers presenting coastline extraction and land cover change analysis using geographic information system (GIS) technology and remote sensing. [22] contains an overall review and a meta-analysis of GIS methods, remote sensing data, the software, materials and indexes used to monitor a coastline over twenty years. It presents papers in which coastline changes for a specific time interval were studied. Bijeesh and Narasimhamurthy [23] reviewed the algorithms, methods and sensors/satellites that had been used to detect and delineate surface waters in remote sensing images. [24] presents the results of research projects in which remote sensing data was used to detect, extract and monitor coastal shorelines. However, only papers that used data from US coasts were selected.

4 Introduction to Synthetic Aperture Radar

The SAR technology can actively collect data by using a sensor that transmits microwave signals and measures the portion of the signal backscattered by the surface of the Earth. The signals are responsive to surface characteristics such as moisture and texture [25]. The primary measure of the accuracy of the radar data acquired is its spatial resolution. It is directly related to the ratio of the sensor wavelength to the length of the sensor antenna [25]. Consequently, assuming a specific wavelength, it can be said that the longer the antenna, the higher the spatial resolution. However, it is impractical to install a large antenna on board an aircraft or on a satellite. This obstacle is eliminated by synthetic aperture systems that feature a short physical antenna, but use modified data recording and processing techniques to synthesize the effect of a long antenna, thus producing higher resolution data.

Popular optic sensors like Sentinel-2’s Multispectral Instrument (MSI) or Landsat’s Operational Land Imager (OLI) make it possible to acquire images in the visible, near-infrared, and short-wave infrared portions of the electromagnetic spectrum. Radar sensors, in turn, use longer wavelengths at the scale of a centimetre to a metre, allowing them to penetrate clouds, which do not affect the quality of the images acquired. The different wavelengths in SAR imaging are typically referred to as bands and designated with letters such as C, L, P and X. Table 1 presents the bands with their respective frequencies, wavelengths and possible applications. Table 1 indicates that the bands L-, C- and X are the most frequently used in SAR instruments.

Radar signals that are transmitted and then received propagate along a specific plane of polarization. Most radars are designed to transmit and receive signals with linear, horizontal (H) or vertical (V) polarization. An important feature of radar sensors is that the polarization of signals can be precisely controlled both during transmission and reception. Signals transmitted in the horizontal polarization (H) and received in the same plane of polarization are marked with the abbreviation HH. If signals are transmitted horizontally (H) and received vertically (V), they are designated as HV, etc. Measuring the signal strength using different polarizations allows us to obtain information about the structure of the imaged surface, based on the following types of scattering: double bounce, rough surface, volume [25]. In particular:

  • Double bounce scattering is the most sensitive to an HH polarized signal and is caused, for example, by tree trunks, flooded vegetation, and buildings.

  • Rough surface scattering is the most sensitive to a VV polarized signal and is caused by, e.g., water or bare soil.

  • Volume scattering is the most sensitive to cross-polarized data such as HV or VH and is caused by, e.g., leaves and branches of trees in a forest.

More information on the different scattering mechanisms can be found in [25]. It should be noted that the individual contribution of each scattering mechanism is presented in grayscale images. The intensity of each pixel in a SAR image represents the proportion of microwaves backscattered by the ground area surveyed, and this proportion depends on the following factors:

  • Types, shapes, sizes and orientations of scatterers in the target area.

  • The amount of moisture in the surveyed area.

  • The frequency and polarization of radar pulses as well as the incident angles of the radar beam.

Pixel intensity values can be converted to a physical quantity called the normalized radar cross-section or the backscattering coefficient measured in decibels (dB), with values ranging from -40 dB for very dark surfaces to +5 dB for very bright objects [25, 26].

Table 1 Designation of microwave bands

Data provided as the linear amplitude backscatter can be stored in the 16-bit unsigned integer format (DN). The DN values can then be converted to sigma-naught values (\(\sigma ^{0}\); units in decibels, dB) using the following equation:

$$\begin{aligned} \sigma ^{0}= 10 \cdot log_{10}(DN^2)+CF \end{aligned}$$
(1)

where DN is the digital number and CF stands for the calibration factor with a given value [26].

Contemporary satellites provide various levels of SAR data, namely:

  • Level 0 SAR data which consists of raw data collected by satellites.

  • Level 1 SAR data is processed from raw Level 0 data using various algorithms to produce Single Look Complex (SLC) images that keep the phase information from raw acquisitions and retain the original pixel spacing. It can also be processed to produce Ground Range De-tected (GRD) images which retain amplitude information from raw acquisitions.

  • Level 2 SAR data are products derived from Level 1 data and are usually geolocated.

Several data formats are used for Level 2 SAR products as one common data format has not yet been established. The formats used at present are GeoTIFF, HDF5 and KMZ [25].

Radar images suffer from the effects of speckle noise, which arises from the coherent summation of signals scattered from ground scatterers. As a result, the radar image appears to be noisier than the optical image. This is why, before the image is analysed further, speckle noise is very often reduced using the appropriate filtration methods. Detailed papers on methods of SAR image pre-processing, including speckle noise reduction, can be found in [9, 27].

Fig. 5
figure 5

A diagram showing segmentation methods which allow detecting and extracting coast shorelines from SAR images, in accordance with the literature review

5 Segmentation Methods Used to Detect Coastlines in SAR Images

During pre-processing, input SAR images are filtered to reduce noise, enhance the contrast, and amplify and/or extract certain features (Fig. 5). These methods are to improve the segmentation process. [9, 27, 28] present a detailed review of literature on the pre-processing methods used for SAR images. Segmentation methods used to detect coastlines in SAR images can be divided into the following groups:

  • Thresholding methods

  • Active contour models

  • Machine learning approaches

  • Other segmentation methods

According to the diagram presented in Fig. 5, thresholding, the watershed transformation and the region-based active contour model can also be used as auxiliary methods that produce preliminary, approximate segmentation results, which are then used in subsequent approaches. The group ‘Other methods’ contains approaches (i.e. watershed, graph-based and edge tracing methods) that are represented by one or two research papers.

The last segmentation stage is post-processing, which is to enable the extraction and/or connection of coast shorelines when they are incomplete, and which also allows removing minor artefacts and distortions from the resulting image.

6 Evaluation Metrics

The methods used to segment coast shorelines were verified using various metrics. The performance of a segmentation method (SEG) is generally assessed by reference to a manual segmentation (REF) performed by an experienced expert. To neutralize the intra-subject variability during the manual segmentation and to produce a truthful REF, a combination of segmentations produced by many experts should be used. Unfortunately, this is rarely practised.

When assessing the performance of segmentation algorithms in comparison to REF, a classification of pixels belonging to the shoreline (assumed to be foreground pixels) or to the background is used. For both cases, these pixels can be classified as follows: true positives (TP) are pixels correctly detected as foreground (positive), false positives (FP) are pixels incorrectly detected as foreground, true negatives (TN) are pixels correctly detected as background, false negatives (FN) are pixels incorrectly detected as background. The presented components allowing pixels to be classified constitute elements of a confusion matrix (or a contingency table), which is the term often used in the literature. It should be added that the REF can also be obtained using a manual recording of the Global Positioning System (GPS) as presented in paper [29].

Table 2 presents all the metrics that were used in the research papers discussed. Sensitivity (Sens), also called Recall, measures the proportion of positives that are correctly identified. Accuracy (Acc) expresses the number of correctly detected true samples divided by the total number of samples and is the metric most commonly used to assess coast shoreline segmentations. Specificity (Spec) expresses the proportion of negatives that are correctly detected, and Precision (Prec) specifies the proportion of positive results that are true positives. It is worth noting that although a high value of Sens represents a desirable property of the method for coast shoreline detection, a high value of Sens with a low value of Spec indicates that the segmentation includes many pixels that do not belong to the shoreline, i.e. there is a high FP value. Consequently, a segmentation method that makes it possible to achieve a high value of Sens and a low value of Spec may be acceptable if a post-processing step can remove possible FP.

Other metrics used include: Negative Predictive Value (NPV), False Positive Rate (FPR) and False Negative Rate (FNR). NPV expresses the proportion of negative results that are true negatives while FPR is calculated as the ratio of the number of incorrectly identified negative samples. FNR, in turn, is calculated as the ratio of the number of positive samples wrongly identified as negative (FN) to the total number of actual positive samples.

Other noteworthy metrics include the Dice Similarity Coefficient (DSC) [30]:

$$\begin{aligned} DSC=\frac{2 \cdot TP}{FP+FN+2 \cdot TP} \end{aligned}$$
(2)

This metric allows one to determine the measure of similarity between two sets of pixels representing REF and SEG segmentations, using defined components of the confusion matrix.

Error measurement based on the Dice coefficient (eD) [31]:

$$\begin{aligned} eD = 1-DSC = \frac{FP+FN}{FP+FN+ 2 \cdot TP} \end{aligned}$$
(3)

Another metric with properties similar to the Dice coefficient is the Jaccard coefficient (JSC) [30]:

$$\begin{aligned} JSC=\frac{TP}{FP+FN+TP} \end{aligned}$$
(4)

However, if the Jaccard coefficient is used, the values obtained are usually lower than of the Dice coefficient.

The \(F_1\) score [32] combines the precision and recall into a single metric by taking their harmonic mean:

$$\begin{aligned} F_1 =2\frac{Prec \cdot Recall}{Prec+Recall}=\frac{2 \cdot TP}{FP +FN+ 2 \cdot TP} \end{aligned}$$
(5)

The metrics presented can produce values from the [0, 1] interval, which can also be expressed as percentages.

All metrics presented above are based on a pixel-to-pixel comparison between the image which is being segmented and the REF, but without taking into account that individual pixels are elements of the contour representing the coast shoreline. There are also metrics that take this into account.

Let us adopt the following symbols. Ref and Seg represent, respectively, the coastline contours produced by the segmentation method (SEG) and drawn manually by an expert (REF). ref and seg, in turn, denote points belonging to the respective contours. The nearest points from the Ref and Seg coastline contours are marked, respectively: \(d_{seg}\) and \(d_{ref}\).

Distance error (\(e_D\)) [33]:

$$\begin{aligned} e_D =\frac{1}{|Seg |}\left( \sum _{seg=1}^{|Seg |} |d_{seg} |\right) \end{aligned}$$
(6)

Calculated as the average distance computed from all s points on Seg to the closest point on Ref.

Root mean standard error (RMSE) [34]:

$$\begin{aligned} RMSE=\sqrt{\frac{1}{|Seg |}\left( \sum _{seg=1}^{|Seg |} |\ d_{seg} |\right) } \end{aligned}$$
(7)

Calculated as the average squared difference between the approximate values and the actual value.

Average Euclidean distance (AED) [35]:

$$\begin{aligned} AED=\frac{1}{ |Seg |} \cdot \sum _{x,y \in Seg} \sqrt{d^2_{ref}(x) + d^2_{ref}(y)} \end{aligned}$$
(8)

The AED calculates the segmentation performance between the Seg and Ref contours.

Pratt’s figure of merit (FOM) [36]:

$$\begin{aligned} FOM=\frac{1}{max \{ |Seg |, |Ref |\}} \cdot \sum _{seg=1}^{|Seg |}\frac{1}{1+d^2_{seg}\cdot \alpha } \end{aligned}$$
(9)

A measure of similarity between two contours. The parameter \(\alpha\) is a positive scaling factor.

Let us adopt additional symbols. Let N represent the same number of points determined on the contours Ref and Seg. The coordinates of a point of the contour Ref for a specific index \(n \in N\) will be represented by \((ref_{nx}, ref_{ny})\), while \((seg_{nx}, seg_{ny})\) denotes the coordinates of the point of the contour Seg, which was determined for the point \((ref_{nx}, ref_{ny})\).

Modified average Euclidean distance (MAED) [37]:

$$\begin{aligned} MAED=\frac{1}{N} \cdot \sum _{n =1}^{N} \sqrt{(seg_{nx}-ref_{nx})^2} \nonumber \\ +\overline{(seg_{ny}-ref_{ny})^2} \end{aligned}$$
(10)

MAED calculates the performance of segmentation between the contours Seg and Ref for a set number of points N.

Additional assumptions were made for calculating the MAED metric [37]. If no seg point is found for the defined point \((ref_{nx}, ref_{ny})\), then the maximum distance taken from correctly made measurements will be calculated as the error. In addition, if several seg points were detected for the defined point \((ref_{nx}, ref_{ny})\), then the seg point which is the farthest from the point \((ref_{nx}, ref_{ny})\) will be used for calculations.

The metrics presented in equations (6)–(10) are calculated based on the pixels (i.e. points) of the contours Ref and Seg, and the results are also expressed in pixels. However, the values obtained in pixels can also be converted to values expressed in meters. The lower the values produced by equations (6)–(10), the greater the accuracy of the segmentation method.

In this literature review, the assessment of segmentation results based on the metrics used in individual papers was not really addressed.

Table 2 Metrics for assessing segmentation methods used to detect coastlines

7 Thresholding

Thresholding produces a binary image from a grey-level or a colour image. The simplest binarization method works in such a way that for a given threshold intensity T:

  • pixels with intensity values smaller than T are replaced with black pixels

  • pixels with intensity values greater than or equal to T are replaced with white pixels

The T threshold can be set using global or local methods. Global thresholding methods consist in determining a single intensity threshold for the whole image. The Otsu method [38] is an example of a global threshold setting. This method is very often used in segmentation. This method consists in maximising the between-class variance which is a well-known measure in statistical discriminant analysis. The main assumption is that properly thresholded classes should differ in the intensity values of their pixels, and, conversely, the threshold ensuring the best separation of classes in terms of their intensity values will be the optimal threshold. The between-class variance is expressed as follows:

$$\begin{aligned} \sigma ^2_{b}=w_{1}(t)w_{2}(t)[\mu _{1}(t)-\mu _{2}(t)]^2 \end{aligned}$$
(11)

where \(w_{1}(t), w_{2}(t)\) denote the probabilities of the two classes separated by a threshold t with a value from the interval [0, 255]. \(\mu _{i}\) is a mean of class \(i \in {1,2}\).

Global thresholding methods are fast. However, they may produce poor results for some image classes, e.g. if objects are non-uniformly illuminated. In this situation, it may be better to use different threshold values determined on the basis of local pixel intensity values for individual image fragments. One popular methods is variable thresholding based on local image properties [39]. Let \(m_{xy}\) and \(\mu _{xy}\) denote the mean and the standard deviation calculated for a set of pixel values in the \(S_{xy}\) neighbourhood centred on the (xy) coordinates in the analysed image. Variable threshold values based on local image properties are as follows:

$$\begin{aligned} T_{xy}=a\sigma _{xy}+bm_{xy} \end{aligned}$$
(12)

where a and b are certain positive constants set by the user, and

$$\begin{aligned} T_{xy}=a\sigma _{xy}+bm_{G} \end{aligned}$$
(13)

where mG is the global mean of the image. The image is segmented as follows:

$$\begin{aligned} g(x,y) = {\left\{ \begin{array}{ll} 1 &{} \text {if } f(x,y)> T_{xy} \\ 0 &{} \text {if } f(x,y) \le T_{xy} \end{array}\right. } \end{aligned}$$
(14)

where f(xy) is the input image. This equation is calculated for all locations of pixels in the image, and for each (xy) location, a different threshold is computed using pixels taken from the \(S_{xy}\) neighbourhood.

A detailed survey and evaluation of various thresholding methods can be found in [39,40,41]. Figure 6 shows a general diagram of thresholding methods.

Fig. 6
figure 6

A diagram of the operation of thresholding. For global thresholding, one intensity threshold T is set for the whole image. For local thresholding, so–called local intensity thresholds are determined for defined areas of the image. Prior to thresholding, various methods of source image pre-processing are used to help determine the intensity threshold or thresholds

Liu and Jezek [42] presented a method consisting of several successive steps. First, the images are processed to reduce noise using the Lee filter [43], which enables the effective reduction of speckle noise but does not weaken the sharpness of the edges at the same time. Speckle noise is a granular pattern, a special kind of noise that is always present in SAR images and, unfortunately, reduces their quality. In the next step, the anisotropic diffusion filtering [44] is used, which is to amplify strong edges along coasts and suppress weak edges. Then, segmentation is performed using a locally adaptive thresholding method for specific square areas of the image, in which the local binarization threshold is determined separately. This threshold is determined on the basis of a histogram analysis and by fitting a bimodal Gaussian curve. To eliminate false, irrelevant edges, the Canny edge detector is used [39]. If there are background elements in the binary image, they are eliminated using the morphological operators of dilation and erosion [45]. The authors presented results for example SAR images of the coast of Antarctica, also demonstrating that the proposed methods can be used to segment optical satellite images.

The authors of [46] use a solution in which, first, the coherence obtained based on an interferometric pair of SAR images is determined. Coherence is a measure which is to enable distinguishing between the land and the sea. Additionally, noise is reduced from the image using a moving average filter. A fuzzy connectivity map based on normalized intensity values is also created. This is an iterative seed-growing process which is to account for the intensity-connectedness of individual pixels. The growth starts from a seed point (i.e. a pixel) of the coast selected by the user. Then, the binarization threshold to be used to separate pixels representing water and land is determined experimentally. The proposed solution is somewhat limited by the need to manually set the binarization threshold and then adjust it depending on the results obtained.

A similar approach is presented in [47], i.e. a coherence map is determined based on an interferometric pair of SAR images. This produces lower resolution images as a result. Then, the number of low coherence pixels, i.e. those representing water, and their percentage in the image are calculated, which allows the threshold value to be obtained automatically. At the same time, the image with the original resolution is processed using the Gaussian Markov random field (GMRF) [48]. This technique allows the spatial correlation of neighbouring pixels to be estimated. This is to help extract the shorelines of coasts. The image processed with the GRMF method is then binarized using the automatically estimated threshold and the coastlines are obtained this way.

Buono et al. [49] proposed an approach where a multipolarization analysis of sea surface backscattering is carried out first to facilitate the subsequent segmentation process. Then, a global threshold Constant False Alarm Rate (CFAR) algorithm was developed, which produces a binary image representing water and land pixels. In this research project, SAR images were processed taking into account both different types of polarization and the wind conditions at the time of data acquisition. This is an interesting study, as this subject was not analysed in other papers on coastline detection, while it turns out that in SAR images obtained under high wind conditions, wave-like patterns are clearly visible and make segmentation difficult. The last step is to extract the coastline from the binary image using the Sobel edge detector. This research is continued in [50], in which multipolarimetric SAR imagery was used. In the first step, the authors used a non-local speckle filter [51] which reduces speckle noise, but simultaneously retains fine edges well. Then, they performed a metric evaluation to assess single- and dual-polarization features, and then, they chose the appropriate parameter which would be used for extracting the shoreline of coasts. In the next step, they applied the (CFAR) thresholding algorithm. After a binary image had been obtained, it is filtered using morphological operators to fill gaps and remove minor artefacts. Then, the Canny filter was used to extract continuous shorelines of coasts.

Pelich et al. [52] used a method that consists of three consecutive steps. First, they carried out a pixelwise averaging operation [53] which is to reduce speckle noise. The advantage of this filtration method is that the original spatial resolution of SAR images is retained. Then, they used the multitemporal intensity average to isolate the shoreline of coasts, assuming that pixels representing the sea have low intensity values and pixels representing the land have high intensity values. In the next step, the image was divided into areas that have bimodal histograms of pixel brightness distribution. The areas obtained were binarized using adaptive thresholding. In the end, the region growing method was used in the binary image to achieve a clear separation of water from land.

The study in [54] is an attempt to propose a method of segmentation for both SAR images and images acquired with optical instruments. In the first step, the images are pre-processed using Gaussian and median filtration to reduce noise. Then, images are divided into rectangular areas in which thresholding is performed separately. The binarization threshold is set experimentally in each rectangular area. The shoreline of coasts is extracted using the Sobel edge detector. The results obtained may represent non-continuous coastlines. This is why the local neighbourhood of pixels (representing land and water) in places where there are discontinuities is checked and the appropriate pixels are added to obtain a continuous coastline.

In [55], a combination of classical image processing methods was used. Logarithmic scaling of pixel brightness was used during the pre-processing, followed by a median filter to improve image contrast and reduce noise. Then, edges were detected by applying a modified version of the Canny method [56] which uses a thresholded gradient magnitude cluster to determine edge segments. Then, if the results were still not accurate, the following methods were used iteratively: flood fill [57] to fill the extracted contours, and then thresholding, in which the set value of the brightness threshold was raised or lowered according to the visual evaluation of the obtained results.

A summary of the analysed thresholding methods is presented in Table 3.

8 Active Contour Methods

Active Contour Models (ACMs) use curves (C) defined in the image domain, which can move and deform under the influence of internal forces (\(F_{int}\)) and external forces (\(F_{ext}\)) to detect object boundaries (including coastlines in SAR images). As the active contour model requires initialising, a robust model should be insensitive to the initial position from which the adaptation of the contour to find the coastline starts. In addition, this model should be resistant to noise occurring in the source image. Recent research has focused on achieving the best accuracy of edge detection taking into account the coastline geometry. These approaches allow coastlines with irregular contours to be identified, both in non-urbanized and urbanized areas. However, the required computational cost limits their application in high resolution images.

Active contour models can be divided into edge-based and region-based, and will be analysed below in detail.

8.1 Edge–Based

Edge-based models use an explicit representation of the curve C. These models use local edge information obtained from the image to fit themselves to the boundaries of the approximated shape. Edge-based ACMs use intensity- or gradient-derived external forces to find edges, and the progress of the segmentation is greatly influenced by contour initialisation. If it is initialised far from the edges sought, obtaining the correct segmentation may be problematic because of the local properties of the intensity gradient calculated [58].

Parametric models [59] and geometric models [60] can be distinguished within edge-based models.

8.1.1 Parametric

The parametric model was introduced by Kass et al. [59] and it represents the curve C in a parametric form, that is:

$$\begin{aligned} \gamma \frac{\delta C}{\delta t}=F_{ext}(C)+F_{int}(C) \end{aligned}$$
(15)

where \(\gamma\) is the damping coefficient. The external force \(F_{ext}\) allows the contour to move towards the edges searched for in the image, and contains information about the intensity or gradient values of the image. It can take different forms depending on the adopted method. \(F_{int}\) has the following form:

$$\begin{aligned} F_{int}(C)=\frac{\delta }{\delta c} (p\frac{\delta }{\delta c})-\frac{\delta ^2}{\delta c^2} (q\frac{\delta ^2}{\delta c^2}) \end{aligned}$$
(16)

where \(c\in [0,1]\), and p and q are weight parameters. These parameters allow the curve C to be controlled, influencing its flexibility and preventing its bending, respectively. A general diagram of parametric ACMs is shown in Fig. 7.

Table 3 Summary of thresholding methods (evaluation metrics given in Table 2 )
Table 4 Summary of active contour methods for coastline detection (evaluation metrics are given in Table 2 )
Fig. 7
figure 7

A diagram of the parametric model operation. The segmentation process consists in the evolution of the parametric curve C as a result of applying the external force (\(F_{ext}\)) and the internal force (\(F_{int}\)). \(F_{ext}\) is defined based on image data, such as the intensity or gradient. \(F_{int}\) has set parameter values which are to allow controlling the curve C, influencing its flexibility and preventing its bending

Shang et al. [61] proposed a method for detecting coastlines in SAR images that integrates the watershed transformation [62]Footnote 1 and the gradient vector flow (GVF) active contour model [63]. The watershed segmentation is to simplify obtaining the initial contour for the GVF active contour. In [61], \(F_{ext}\) consists of the Gradient Vector Flow (GVF) term [63] which allows the map of the image edge to be calculated. The authors applied an improved GVF term to make curve evolution more stable and controlled than in the original model [63]. The method used in [61] is called the controllable GVF (CGVF) snake model. However, despite the improvements made, it was found that after the initial contour has been obtained, the CGVF model may require repeating its evolution several times in order to obtain stable and satisfactory results, which unfortunately extends the computation time. Another limitation is the manual determination of sea and land markers for the watershed transformation, so the process of determining the initial contour and also of the segmentation is not automated. However, the initial contour obtained is close to the coastline, which facilitates its subsequent delineation.

The use of the existing methods, namely ACM-based or edge-detection-based, to extract the coastline from SAR and polarimetric SAR images can be very limited, particularly if the analysed images are noisy, and also because the shoreline of the coast is complex and often irregular. To eliminate these limitations and improve segmentation results as well as the computational efficiency, authors combine various methods, as in the paper [61]. Another way is to use two different ACMs, i.e.: a region-based one [64]Footnote 2 to extract a coarse coastline, and then a parametric edge-based ACM [65] to improve this coarse coastline. However, a region-based ACM [64] is used for low resolution images obtained using the multi-look processing technique [66]. The coast shorelines C produced in this way form the initial contour for the parametric ACM [65], which then performs the segmentation in the original high-resolution images. The authors presented precise results of extracting coastlines, also in the vicinity of ports, for examples of SAR and polarimetric SAR images they used.

8.1.2 Geometric

Geometric models [67, 68] can be described by the following equation:

$$\begin{aligned} \frac{\delta C}{\delta t}= S {\varvec{n}} \end{aligned}$$
(17)

where S is a speed function and \({\varvec{n}}\) denotes a unit normal vector to C. The speed function can be defined as [69]:

  • a certain constant term (e.g. inflation term) which is to allow the contour curve to deform along the C normal

  • a deformation term which depends on the curvature \(\kappa\) of the curve C.

As no curve parameterization has been introduced in the geometric model, the evolution of the curve C is influenced only by geometric constraints. As a result, topology changes of the curve can be handled correctly, which was problematic in parametric models. The Level Set (LS) method [70] is usually used to implement geometric models:

$$\begin{aligned} \frac{\delta \Phi }{\delta t}= S(\kappa ) |\nabla (\Phi ) |\end{aligned}$$
(18)

Where \(\Phi =\Phi (C)\) denotes the LS function.

An LS formulation can be presented that includes a regularization term which should allow approximating poorly visible edges and also deal with gaps occurring in edges. This is proposed in [71, 72], namely:

$$\begin{aligned} \frac{\delta \Phi }{\delta t}= S(\kappa ) |\nabla (\Phi ) |+ \nabla (S)\nabla (\Phi ) \end{aligned}$$
(19)

Different definitions of speed functions can be found in the literature, mainly based on gradient information. A diagram of the operation of a geometric model is shown in Fig. 8.

Fig. 8
figure 8

A diagram of the operation of a geometric model. The contour is evolved using the level set method in which the level set function (\(\Phi\)) must be specified. After the contour is initialised, its evolution is controlled by the speed function. After the contour has finished evolving, its current position enables the segmentation and corresponds to the zero-level of the function (\(\Phi\))

Wei et al. [73] presented an approach where the sea-land boundary is detected using a geometric active-contour model (GAC) and the LS function. In [73], the signed pressure (speed) function [74] has been used as the boundary stop condition of the GAC model. This function utilises a Gaussian operator to facilitate segmenting weak boundaries that may be present in low-contrast SAR images. In order to reduce the number of calculations, the authors use grid sampling points which are converted into a certain number of small disks and then allow initialising the GAC model. The study in [73] contains a detailed analysis of the computational efficiency of the solutions applied, also in comparison to other methods, taking into account different initialisation methods.

[75] presents a GAC model with the use of an LS function and the equation (18), and with an edge-stopping function based on [72]. Ouyang et al. [75] proposed a heuristic using a binarization and an analysis of groups of pixels with a defined neighbourhood to improve the results of extracting edges representing shorelines of coasts. In addition, a filter was used to average the resolution of SAR images, which is to eliminate speckle noise and is also expected to improve the computational performance of the ACM. [75] presents the segmentation time for several analysed SAR images with the resolution of 256\(\times\)256 pixels, representing non-urbanized regions.

8.2 Region Based

A region-based ACM was proposed in [76, 77], with a deformable curve C moving according to defined foreground and background region constraints. These two regions are assumed to be statistically homogeneous, and the main differences in this class of methods consist in the way that region statistics are determined. Compared to edge-based ACMs, region-based models can detect the edges of many objects and can fit more easily when the edge is irregular, benefitting from the advantages of a more adaptive topology. However, despite significant advantages, region-based models have a much higher computational cost than edge-based ACMs [60]. What is more, grey-level images can also often have statistically non-homogeneous regions [76, 77] which may cause incorrect segmentation results. In region-based ACMs, a minimisation of energy based-segmentation with the use of a level-set (LS) is considered [76, 78]. The following energy functional for the image I is assumed:

$$\begin{aligned} F(c_1,c_2,C)&=\mu \cdot \text{ length }(C) \nonumber \\&\quad + \nu \cdot \text {area}(\text {inside}(C)) \nonumber \\&\quad +\lambda _1\int _{\text {outside}(C)} |I({\textbf{x}})-c_1 |^2 d{\textbf{x}} \nonumber \\&\quad +\lambda _1\int _{\text {inside}(C)} |I({\textbf{x}})-c_2 |^2 d{\textbf{x}} \end{aligned}$$
(20)

whereas inside(C), outside(C) and \(c_1\) and \(c_2\) denote, respectively: regions inside and outside the contour C, while \(c_1\) and \(c_2\) are two constants approximating the image intensity inside and outside the contour C. \(\nu >=0\), \(\lambda _1>0\) and \(\lambda _2>0\) are fixed parameters. The equation (20) is defined globally (i.e. for the whole image) and, unfortunately, does not take into account local information. Consequently, the approaches [79,80,81] were proposed to enable the best fitting of the contour to the shoreline of coasts, taking into account their local characteristics and the noise found in SAR images.

Liu et al. [79] proposed the iterative use of the LS formula to gradually approximate the coastline. First, the ACM is applied to the original resolution image to perform a coarse segmentation, which is preceded by the boxcar filtration [82] of the SAR image. Then, the resulting segmentation is gradually refined using pre-processing and the ACM, based on an analysis of the calculated offset of the produced contour superimposed on the source image from the actual shoreline of the coast. In the following steps, image filtering and then segmentation are carried out only for image areas in which the offset of the extracted contour from the coastline is significant. Finally, binarization is executed to remove small areas not belonging to the coast, and water and land are extracted. The authors presented results for both single-polarization and quad-polarization SAR images, and also reported the quadratic computational complexity of the presented ACM.

Modava et al. [80] presented an approach in which the coastline is delineated in two successive steps. The first step consists in applying a local spectral histogram (LSH) [83] and is to enable a coarse land/sea segmentation. The LSH method uses spatial information and allows segmenting images with both textured and non-textured regions. In addition, this method is not sensitive to the exact location and orientation of texture elements. In the LSH method, speckle noise does not affect the segmentation results so there is no need for pre-processing methods to reduce it. In the next step, a region-based LSM [80] is proposed to refine the segmentation results produced by the LSH method and to precisely extract the coastline. The study in [80] presents a hierarchical LS regularization using two Gaussian kernels. The authors presented the segmentation results for single polarization SAR images including very narrow regions.

A similar approach was proposed in [81], where single spectral-textural features (STFs) [84] are used to enable the initial land/sea segmentation. In the next two steps, the shoreline of coasts is detected and refined using two active contour methods, i.e.: the global region-based level set method (GRB-LSM) and the local region-based level set method (LRB-LSM). The GRB-LSM and LRB-LSM methods utilise the region-based signed pressure force (SPF) function [81]. The SPF function allows one to flexibly control both the expansion of the contour if it lies inside the object being approximated and its contraction when the it lies outside the object. However, unlike in the solution proposed in [81], the authors presented two separate iterative equations of active contour methods, where the equation of the GRB-LSM model applies the global information of the image to get closer to the shoreline of coasts, while the iterative equation of the LRB-LSM model uses local image information to improve the results, including contour smoothness. In [81], experiments were carried out on SAR images recorded in different microwave bands, i.e.: L-, X-, and C-bands of ALOS PALSAR-2, TerraSAR-X, and Sentinel-1A.

In [85], the approximation of the shoreline of the coast in SAR images was carried out in several steps. At the beginning, the spatial fuzzy clustering method [86] is used to enable initial segmentation. The clustering method yields good results for noisy images. Then, the Otsu method binarization is performed, and then morphological filters are employed to eliminate minor, unconnected areas of the image. The SAR image processed this way becomes the starting region for a region-based LSM [87] which produces the final shape of the coastline. The authors emphasize that they used an ACM [87] which does not require reinitialisation and is much more computationally efficient than the original model [76].

Mao et al. [88] used a method that calculates a global matrix of oriented gradient of histograms on wavelet subbands. This is to allow an accurate edge information extraction for detecting the shoreline of the coast. Then, binarization is carried out. In the last step, segmentation is performed using a region-based ACM [76]. The authors modified the original version of the ACM [76] by introducing constraints of extracted edge information, including a constraint of pixel intensity information. This is because local image information may be omitted in this ACM model [76]. The authors focus on the accuracy of segmentation results for example SAR images of a specific resolution. However, they do not mention the computational efficiency of the methods used. This method can be assumed to be inefficient because apart from pre-processing methods including wavelet transforms, it uses an ACM in which reinitialisation is necessary.

Pre-processing is aimed at reducing noise in the image, and also improving segmentation results produced by ACMs. In addition, the use of binarization produces a region which allows the initialisation of ACMs [76, 79, 85]. This solution is also used in [89]. After the noise is reduced with Gaussian filtering and a histogram transformation, the processed image is binarized and then a region-based level-set ACM is employed. In the post-processing step, surplus groups of pixels are removed using the opening and closing morphological operators [62]. Finally, shorelines of coasts are extracted from the binary image obtained. Shu et al. [89] tested the results for SAR images with a resolution of 1024\(\times\)1024 pixels, depicting both non-urbanized and urbanized areas. The manual determination of the binarization threshold, required for each subsequent image, limits the proposed approach to some extent.

Huang and Zhang [90] used two ACMs. The first [76] executes an approximate, global segmentation, determining the starting contour for the next ACM which computes local statistics of neighbourhood regions and produces the final segmentation. The second model used is called a local statistical active contour. In this model, thanks to combining a penalizing term of a level set function, there is no need for the periodic reinitialisation to repair the level set function degraded during the evolution of the contour in subsequent iterations.

Table 4 summarizes the segmentation results obtained using active contour methods.

9 Machine Learning Methods

Machine learning can be divided into two main categories: supervised and unsupervised. In supervised learning, input data includes a set of training examples (e.g. a set of ready, correctly conducted segmentations) which are then used to train the learning model. In unsupervised learning, only the data set is provided with no correct output results. So, in this approach, pattern segmentations are not used or are not present in the available data set.

9.1 Unsupervised

Unsupervised learning methods process specific features taken from the statistical distribution using data from the input set. Then, these methods should learn to label every input image, having no information about pattern labels. Image segmentations are obtained based on indeterminate features like the intensity and gradient. The results obtained may be worse than produced by supervised methods. However, if there are no ready pattern segmentations, the use of these methods is perfectly appropriate. Figure 9 shows an example of a typical unsupervised learning diagram.

Fig. 9
figure 9

Diagram of unsupervised learning. The segmentation model is developed on the basis of extracted, indeterminate features of the source image, such as its intensity and gradient. The model is then evaluated and fine-tuned to obtain the best segmentation results. The evaluation and fine-tuning of the model are usually made possible by the distance error or the root mean standard error

Liu et al. [91] presented a solution in which a texture feature set must be created at the beginning. Six difference of offset Gaussian (DOOG) filters and also two different difference of Gaussian filters [92] were used for this purpose. Then, the principal component analysis method (PCA) [93] was used to limit the amount of superfluous information obtained in the previous step and to optimize the features obtained. In the next step, the K-means++ algorithm [94] was used to simplify selecting the right initial seeds for the K-means clustering method [95] which produced a coarse land/sea segmentation, unfortunately containing many superfluous regions. To eliminate this oversegmentation, the adaptive homogeneity test (AHT) was used [96]. The AHT method compares the feature distributions of two regions and is a criterion of the similarity of regions, allowing them to be properly merged, separately for pixels representing the sea and land.

Meng et al. [97] presented an approach in which SAR images were pre-processed to reduce speckle noise using the wavelet decomposition algorithm [98]. Then, the classical method of Fuzzy C means (FCM) [86] was used to extract clusters containing water and land areas. After the FCM method was initialised, clusters representing water and land were separately, iteratively merged. A way of distributing data into clusters was assumed that maximises the dissimilarity between the different clusters while simultaneously minimising the similarity degree between the pixels divided into the same cluster (water/land). The experimental data came from two satellites, namely: Gaofen 3 and Sentinel 1, for which a comparison of the produced coast shorelines was presented.

In [99], the authors used data from images produced both by SAR and by the Interferometric Synthetic Aperture Radar (inSAR). First, non-local filtering [100] was applied to amplitude and coherence images acquired with the inSAR technology. Then, Gaussian filtering was used for specified variance values, which produced a certain set of processed images. In the next step, segmentation was performed using the K median clustering method [101] which is a version of the better-known K-means clustering method [95]. The difference between these approaches is that the median [101] is used instead of the mean [95] to calculate centroids in clusters. Then, an averaged binary image is produced after checking the common number of pixels representing water and land in the images obtained in the previous step. If there are separate, small groups of pixels (i.e. noise and/or inland water bodies), a morphological fillhole algorithm [62] is used to eliminate them from the binary image. At the end, the extracted shorelines of coasts are superimposed on a SAR amplitude image in order to compare them with pattern contours.

Table 5 summarizes the results of segmentations performed using unsupervised learning methods.

9.2 Supervised

Segmentation is performed using specific features (e.g. image intensity, gradient) obtained from images from the training set. The machine learning model is trained using defined features and the appropriate labels, obtained on the basis of segmentations prepared by experts. Then, after the training process, this model is used to segment images from the test set. A general diagram of supervised learning is shown in Fig. 10.

Fig. 10
figure 10

A diagram of supervised learning. At the training stage, image features such as the intensity and gradient are extracted from the images in the training set. These features and labels, which are pattern segmentations, are used to train the machine learning model. Then, the trained model can be used to segment new, previously unused images from the test set

Convolutional Neural Networks (CNN or ConvNet) are a class of artificial neural networks (ANN) mainly used for processing data with a grid-like topology, such as digital images. To separate water and land pixels, a CNN uses patterns taken from a training set containing images with annotations. The CNN can then make predictions about images from the testing set and it works very well for a large quantity of labelled data. As the CNN utilises fully connected layers, it can be used directly for the segmentation [102, 103]. During the training of the CNN, the convolutional layers of the network can automatically generate features and then combine them into hierarchical predictive models [104, 105]. A major advantage of CNNs is that they can model patterns of objects of variable shape.

Artificial neural networks (ANN) that make deep learning possible can have an architecture with 10+ or even many more layers. Network models that can consist of 100+ layers have already been developed [106, 107]. Deep neural networks can successively and systematically filter data from the training set, and then fine-tune and extract important features that are later used in the pattern recognition or classification process. However, deep learning methods can only produce good results if there is a sufficient quantity of training data and if it has been correctly selected. If there is little training data and if the training set contains incomplete annotations, it is worth using a dense convolutional network called the U-Net [108, 109] for the segmentation. This model features two paths, namely the contraction (also known as the encoder) and the expansion (also known as the decoder), which is symmetrical to the contracting path. The encoder is a classical convolutional network consisting of convolutional and maxpooling layers. The decoder, in turn, is used to enable the precise localization using transposed convolutions. The connection path between the contracting and expanding paths (also called the bottleneck) includes two successive 3x3 convolutions followed by a ReLU activation. The literature also describes newer varieties of this model, such as U-Net++ [110] or U-Net 3+ [111]. The U-Net++ model has an architecture with nested and dense skip connections which are to improve its segmentation accuracy. A further development of this model is the U-Net 3+, whose architecture contains full-scale skip connections and deep supervisions. In practice, this makes it possible to obtain low-level details with high-level semantics from feature maps in different scales [111] and limits the number of parameters required. To segment the coastline based on SAR images, the U-Net model and its certain modifications proposed by article authors were most frequently used. A diagram of the U-Net model is shown in Fig. 11.

Fig. 11
figure 11

Diagram of the U-Net model [108]. Every rectangle corresponds to a multi-channel feature map. The number of channels is given at the top of the rectangle, and the size of features is shown at its lower left edge. White boxes represent copied feature maps. The arrows show specific operations as listed in the right legend

Figures 12 and 3 show that the shorelines of coasts differ a lot, especially when non-urbanized and urbanized areas, e.g. in the vicinity of ports, are compared. Unfortunately, so far there have been no publicly available datasets with annotations and pattern segmentations prepared by experts, which could be used to train neural networks and then evaluate their performance on a test set. Authors of papers use private, inaccessible collections of images that contain pattern segmentations. This is a significant limitation, as it is difficult to uniformly assess and verify the results obtained.

In [29], a U-Net was used for a segmentation allowing coast shorelines to be detected in SAR images. In this research project, a very small set of images was available (25 images in the training set and 30 images in the test set). This is why, to create new image samples, the authors applied data augmentation to the training set in a way that also reduced the undesirable overfitting in the learning model. In particular, they used intensity and spatial augmentation and multi-sample mosaic augmentation. Intensity augmentation includes specific image processing techniques, i.e.: gamma, additive, or multiplicative intensity shifts, blurring, multiplicative or additive noise, blurring and cropout [112]. Spatial augmentation, in turn, encompasses the following transformations: rescale, flips, creating random patches, rotation, mosaicing. The authors proposed four models with different activation functions in the last layer (i.e. softmax or sigmoid), and also used specific loss functions (that is Dice+Focal [113, 114] or binary cross-entropy [115]). [29] contains a discussion of various possible U-Net network configurations and the impact of specific processing methods on improving or worsening segmentation results.

Heidler et al. [116] developed a HED-UNet to perform a segmentation allowing Antarctic coast shorelines to be extracted from images acquired with synthetic radar sensors. The proposed solution is a combination of a holistically nested edge detection network (HED) [117] and a semantic segmentation framework that uses a UNet [108]. The HED is a CNN-based edge detection system that can be used for both natural and grey-level images. The results produced using classical edge detection methods are very often distorted by various factors such as: speckle noise, uneven lighting brightness, distortions from imaging artefacts, rough surfaces, wave crest lines, and by inland structures whose appearance is very similar to the shorelines of coasts. Unfortunately, this makes pre-processing and/or post-processing or even manual corrections necessary. What is more, classical edge detection methods cannot recognize the meaning of objects outside and inside the edge line and hence the result is insufficient for labelling land and sea areas. However, combining the results produced using edge detection and semantic segmentation can solve this problem [116]. The authors presented a comparison of the segmentation results produced by various neural network models as well as several popular edge detection techniques.

Zhang et al. [118] developed an approach in which the methods presented in [42] were first used to determine the approximate location of water lines in high-spatial-resolution SAR images. Then, the authors proposed a U-shaped Deep CNN-based model for executing the segmentation. In [118], the authors used solutions from the U-Net 3+ model [111] utilising full-scale skip connections which make it possible to obtain enough information from the full-scale feature map. The authors found that the direct use of the U-Net 3+ model would be very time consuming and could also produce incorrect results due to the very high foreground-background class imbalance. This is because coastlines make up less than one percent of the whole image. To solve the problem of class imbalance, an \(\alpha\)-balanced cross-entropy loss function [113] was used.

The U-Net model was also used in [119]. This model was used to perform a segmentation allowing the shorelines of the coast of the island of Taiwan to be detected. The results of the segmentation process are binary images which require postprocessing, i.e. extracting edges with the Canny edge detection algorithm and using specific morphological operators to achieve continuous boundary lines.

[120] presents a shoreline detection method using an artificial neural network (NN). This approach uses the feedforward NN to classify the pixels into two categories, that is land and sea. Then, the location of the shoreline is determined as the boundary between these two groups of classified pixels. The structure of a feedforward NN consists of four layers, i.e.: an input layer, two hidden layers, and an output layer. The shoreline of coasts was extracted using a two-dimensional horizontal wavelet obtained as a product of Haar and Gaussian functions.

De Laurentiis et al. [121] proposed the joint use of an autoencoder and a Pulse-Coupled Neural Network (PCNN) [122, 123] to extract shorelines of coasts. Autoencoders are simple learning networks whose task is to transform inputs into outputs with as few distortions as possible on the output [124]. In practice, autoencoders are used to reduce distortions, including speckle noise, in SAR images. The use of a parameter adaptive PCNN in the next step can allow achieving high segmentation accuracy at a low computational cost [125]. The authors compared the results they obtained to those of several other methods, including approaches that are combinations of different methods, e.g. a Principal Component Analysis [126] and a PCNN (PCA-PCNN), an Artificial Autoencoder Neural Network and an Expectation-Maximization (EM) image segmentation [127] (AAN-EM) and also PCA-EM.

Table 6 summarizes the results of segmentations performed using supervised learning methods.

10 Other Segmentation Methods

This section presents segmentation methods that are represented by one or two research papers, i.e.: graph-based, edge tracing and watershed methods. As these methods constitute separate approaches, they are described in separate subsections. Segmentation results obtained for these methods with the use of metrics employed in them are shown together in Table 7.

10.1 Graph-Based

In graph-based approaches [128, 129], the image is considered as a graph \(G = (V, E)\), where V is a set of nodes (i.e. pixels) and \(E\subseteq V \times V\) is a set of ordered pairs of elements from V, i.e. edges. If \((u, v) \in E\) implies that \((v, u) \in E\) and vice versa, then this means that the graph is directed. If not, the graph is undirected. In a digital image, E is a set of undirected edges between pairs of pixels. It is also assumed that the weight of the edge w(uv) is a function describing the similarity between nodes u and v. The set V is divided into subsets by a graph cut. The graph cut is a division of the set V into two subsets A and B such that

$$\begin{aligned} A \cup B = V \text { and } A \cap B = \emptyset \end{aligned}$$
(21)

where the ‘cut’ is done by removing the edges connecting subgraphs A and B. Image segmentation can be interpreted in the form of a graph cut as follows [129]:

$$\begin{aligned} cut(A, B)=\sum _{u \in A, v \in B} w(u, v) \end{aligned}$$
(22)

The minimum cut of a graph [129, 130] (min-cut) is defined as the minimum sum of weights of edges. This means that the lower the edge cost, the weaker the binding as a region is. Consequently, edges whose cost is low generally become parts of the final edge. Instead of calculating the sum of weights of edges, a measure of ‘disassociation’ can be used, which computes the cost as a fraction of the total number of edge connections to all nodes (i.e. pixels) in the graph. This measure is called the normalized cut (Ncut) [128, 132]. It is defined as follows:

$$\begin{aligned} \frac{cut(A, B)}{assoc(A, V)}+\frac{cut(A, B)}{assoc(B, V)} \end{aligned}$$
(23)

where the cut(AB) is given by eq. (22) and

$$\begin{aligned} assoc(A, B)=\sum _{u \in A, z \in V} w(u, z) \end{aligned}$$
(24)

is the total connection from nodes in A to all nodes in the graph. The definition of assoc(BV) is similar.

The following graph partitioning methods are used to separate water and land pixels: minimum graph cuts [129, 130] in [131] and a normalized graph cut [128, 132] in [133]. Figure 12 shows the general diagram of the operation of graph-based segmentation methods.

Table 5 Summary of coastline segmentation methods using unsupervised learning (evaluation metrics are given in Table 2 )
Table 6 Summary of coastline segmentation methods using supervised learning (evaluation metrics are given in Table 2 )
Fig. 12
figure 12

A diagram of the operation of graph-based segmentation methods. First, a graph is created based on: (a) pixels belonging to two different classes, i.e. water and land, which can be marked manually by the user (using labels), (b) and/or image features extracted using the appropriate methods. Then, a specific method of dividing the graph, called the graph-cut, is used to segment the entire image, i.e. to divide it into pixels representing water and land. The graph-cut is made by minimising the cost term, which depends on the image gradient and the coastline shape, and divides the graph into subsets of water and land

The approach presented in [131] utilises several image pre-processing techniques to simplify performing a segmentation that separates water and land pixels using the min-cut method [129, 130]. First, a morphological component analysis (MCA) [134,135,136] is applied to decompose source SAR images into a texture image which includes speckle and spatial patterns, and an outline image. The MCA method requires a dictionary which is a matrix of parameters representing spatial patterns of image intensity. The dictionary is used to conduct the process of training spatial patterns. The authors note that the training process does not require any training data and is automatic. The learned dictionary has atoms (in columns) for outline and texture, and both are mixed in the dictionary. Atoms are small patches of the image of a fixed size. Then, the outline image is used for further processing. This image is smoothed using a non-local means filter [137] to reduce noise still present in it. In the next step, the segmentation is performed using the min-cut method [129, 130] which allows reflecting the distribution of pixel values and differences compared to adjacent pixel values. In [131], the edge cost for the graph cut technique is determined using histograms of the distribution of pixel values of the sea side and the land side. The sea-side and land-side pixels are selected by the user (only one pixel on each side). The result is a binary image representing sea and land areas. The authors assumed that the boundary between these two areas is the shoreline of the coast. An optional step is to use the parametric ACM [59] model when the shorelines of the coasts are not smooth.

Ding and Li [133] presented a solution that consists of three image processing steps, namely: pre-processing, land-sea segmentation, and then post-processing. During pre-processing, the image is prepared for subsequent segmentation using the following methods: speckle noise reduction, geometric correction, and then anisotropic diffusion filtering. Speckle noise is reduced using the Lee filter [43]. Then, the geometric correction avoids geometric distortions of the SAR image. For this purpose, a defined number of Ground Control Points (GCPs) can be used in the source image [138]. The anisotropic diffusion filtering [44] amplifies strong edges along the coastline and at the same time eliminates weak edges, and also unifies changes in pixel intensity for the water and land area. The land/sea segmentation step allows the image to be divided into a certain number of blocks, and then every block is partitioned into regions using the multiscale normalized cut segmentation method (MSNCS) [132]. The MSCNS method forms groups of pixels that represent homogeneous regions of the image. The MSCNS method forms groups of pixels belonging to homogeneous regions of the image, representing water and land. The MSCNS method is an extension of the basic normalized cuts framework [128] using a multiscale graph partitioning approach with linear running time [132]. In the post-processing step, waterline edge pixels are identified, which are then superimposed on the source image, and a manual correction is performed if the obtained results do not fit the actual shoreline of the coast. In [133], SAR images of non-urbanized and suburban areas were analysed. The authors focused on conducting experiments to determine the shoreline movement of the study region within a specified time period.

Fig. 13
figure 13

A diagram of the operation of segmentation methods based on edge (i.e. coast shoreline) tracing. Methods that allow tracing coast shorelines in SAR images require the initialisation of one or more seed points. After the initialisation, a new point is obtained by finding the best fit among all possible new positions (tracing hypotheses) to the shoreline being identified. The fit is computed using image features calculated in the neighbourhood of the last determined point

10.2 Edge Tracing

Methods allowing coastal shorelines to be traced in SAR images must be initialised using one or more seed points, and then subsequent points are determined using data calculated in the analysed image. Seed points must be initialised using image pre-processing methods or manually. Tracing methods can be useful for segmenting very irregular and non-continuous coast shorelines for which a certain number of seed points should be initialised, separately for individual coast sections. A general diagram of segmentation methods based on tracing the shoreline is shown in Fig. 13.

[139] proposes a method of tracing coastlines based on information obtained about edges identified in SAR images. First, an edge magnitude map which is to facilitate identifying the land/sea boundary is created in such a way that the magnitude of pixels on the boundary is greater than in nonboundary regions. This map is produced using the calculated statistical properties of the brightness distribution of pixels characteristic for water and land. The seed point from which edge tracing starts is the pixel with the maximum magnitude in the map created. The next point is the pixel with the greatest magnitude in the determined neighbourhood. In each subsequent step, the authors determine the local neighbourhood of pixels and the permissible direction of identifying points in order to prevent incorrect results. In the proposed approach, the authors did not use the Canny edge detector, which was utilised in the segmentation methods presented earlier [42, 49, 55, 119] and is a very popular technique in digital image processing. This method uses information about the image gradient and allows edges to be obtained based on two user-defined image brightness thresholds, i.e. the high and low brightness thresholds [39]. However, the use of these parameters also has some limitations, e.g., it is difficult to determine whether the selected threshold values will be appropriate for all the SAR images analysed. If not, the user has to intervene to set them again.

10.3 Watershed

The watershed segmentation is a region-based method that splits an image into different non-overlapping regions. It uses concepts from topography, which are expressed with specific operators of mathematical morphology [62]. Popular watershed segmentation algorithms include: the watershed by immersion segmentation [140] with a linear time complexity in relation to the number of input image pixels, as well as an approach called the power watershed [141] that already has a quasi-linear complexity. The power watershed approach was developed by adding the maximum spanning forest [142] for watershed segmentation to an existing combination of algorithms used for image segmentation, namely: graph cut [129], random walker [143] and shortest path [144].

Unfortunately, a problem that often occurs in the watershed segmentation is a great fragmentation of regions, called oversegmentation. As a result, these regions may not correspond to the shapes of the objects that need to be extracted from the image. In order to eliminate oversegmentation, auxiliary transformations are performed before and after the watershed segmentation. They can include initial image filtering [140, 142], using markers [62, 140], merging regions by maximising the average contrast [145], or using a region adjacency graph (RAG) [146]. Figure 14 shows the general diagram of the operation of watershed segmentation methods.

Fig. 14
figure 14

A diagram of the operation of watershed segmentation methods. After the image gradient has been determined, a watershed segmentation is performed. If the results obtained are incorrect due to oversegmentation, a specific method is used to obtain correct results

[147] proposes an approach which extracts shorelines of coasts using the watershed segmentation by immersion [148]. However, before the actual watershed segmentation, the image has to be pre-processed to eliminate speckle noise and prevent oversegmentation. For this purpose, the speckle reducing anisotropic diffusion (SRAD) method is used [149]. In the SRAD method, edge detection is performed by the instantaneous coefficient of variation (ICOV) operator that enhances the edges. The ICOV operator combines a normalized Laplacian operator and a normalized gradient magnitude operator. Then, after the watershed segmentation, RAG [146] and a similarity metric are used to merge fragmented regions and obtain shorelines of coasts.

Table 7 Summary of other segmentation methods (evaluation metrics are given in Table 2 )

11 Discussion and Conclusion

The literature review included 32 papers:

  • 11 papers presenting active contour methods [61, 64, 73, 75, 79,80,81, 85, 88,89,90]

  • 9 papers concerning machine learning methods [29, 91, 97, 99, 116, 118,119,120,121]

  • 8 papers presenting approaches that use thresholding [42, 46, 47, 49, 50, 52, 54, 55]

  • 2 papers presenting graph-based approaches [131, 133]

  • 1 paper using the watershed segmentation method [147]

  • 1 paper presenting the results of an edge-tracing method [139]

According to it, ACMs are the most commonly used, as they are described in 11 articles in total. There are slightly fewer papers on machine learning and thresholding. However, thresholding can also be a supporting technique in other approaches, as shown in Fig. 5. The same applies to the watershed segmentation, which is to enable producing the initial contour for an ACM [61]. Similarly, various ACMs can be used to produce shorelines of coasts [64]. To summarize the segmentation results produced, it can be said that it is difficult to evaluate them unequivocally because the metrics used in papers are different, as are the data sets that have been utilised. Unfortunately, the authors of the papers did not provide example, pattern segmentations prepared by experts, which greatly limits the possibility of evaluating and verifying the methods developed and of proposing new, improved solutions whose results could be compared to the existing ones. This problem could be solved by regularly organizing competitions in which all participants would use the same metrics, determined by experts, to evaluate the segmentation results. This would certainly allow unifying the results obtained and unequivocally selecting the best approaches. What is more, during the competitions and also after their completion, the results of individual methods would be made public, including the data sets on which the experiments were carried out [150].

This literature review justifies the claim that most of the papers proposed segmentation methods for small sets of SAR images representing specific geographical areas selected by the authors. Consequently, it is hard to say whether the results obtained in other regions would be similar. In the field of coastline detection in optical images, there is a very interesting initiative to develop a global shoreline mapping toolbox [151], which allows one to use publicly available Landsat and Sentinel-2 satellite imagery. Unfortunately, no similar tool has yet been developed for SAR images.

The quality of the acquired images is very important for the segmentation performed, hence the need for pre-processing to reduce noise and/or using auxiliary methods to obtain preliminary, approximate segmentation results. This also requires the use and then the appropriate setting of parameters that control the pre-processing and/or segmentation. The more parameters there are, the greater the difficulty in controlling such methods and then applying them in practice. Moreover, only nine of the papers concern segmentation methods that can extract coastlines in both urbanized and non-urbanized areas [47, 64, 81, 88, 89, 97, 99, 119, 133]. If only images from non-urbanized areas are experimented on, this may not yield correct segmentation results in the vicinity of visible man-made infrastructure, e.g. civilian, military, and especially port or industrial infrastructure. For example, port water regions are strongly impacted by the surrounding buildings and ships, and this is visible in the SAR images in the form of scatter points. This is consistent with the effect that metal surfaces of ships as well as metal roofs and walls of buildings cause a very strong scattering of electromagnetic pulses [152, 153].

In future research, it is also worth considering the diverse coast morphology visible in SAR images, as it poses a serious challenge to segmentation methods. Urbanized coastal areas [47, 97] differ significantly from sandy [154, 155] or gravel beaches [156]. In addition, wetlands [118] or rocky coasts [157] have features completely different from coastal areas covered in ice [158, 159]. Consequently, when regions of interest are selected, coastal areas with both simple and complex morphological and geometric features should be included. Of course, sufficiently large sets of high-resolution SAR images are necessary to conduct such research, but their availability is still problematic. Monitoring the ongoing changes of coast shorelines requires time-series data because of the dynamic behaviour of coastal regions. Acquiring and processing SAR satellite images from time series is time-consuming, which makes the stage of preparing data for research difficult.

The number of papers dealing with the use of machine learning is almost trice smaller than of those dealing with other methods. This is certainly due to the unavailability of public image databases with ready-made pattern segmentations. Unfortunately, the authors of papers use private, inaccessible databases. Manually segmenting a large dataset is very time-consuming, tedious, and requires specialist knowledge. However, in the case of medical image segmentation, the literature contains the opposite proportion, i.e. papers on the use of supervised learning methods are the most numerous. This is particularly true if there are public repositories available to researchers and if these are expanded or new ones are created.

The use of segmentation methods utilising deep learning is increasing, which is possible thanks to the growing computing power of computers. Deep learning makes it possible to extract an internal representation of the processed image that will best match the expected results. In classical machine learning algorithms, on the other hand, it is very important to be able to obtain the correct features of the image, and doing this properly often requires great skill and specialist knowledge.

The literature describes such deep learning methods as unsupervised deep learning [160, 161], semi-supervised learning [162, 163] and also generative networks [164, 165]. These methods were created to enable an efficient learning process, also with the use of incomplete data sets. However, these methods have not yet been used for detecting and extracting shorelines of coasts in SAR images.

A valuable research direction is to propose segmentation methods that make use of two imaging techniques, i.e. SAR images and images acquired with optical instruments [42, 54]. Another significant direction of research is to develop methods of extracting water, both in urbanized and non-urbanized areas, for different geographical regions of coasts, and then the results obtained can be used to more easily extract shorelines of coasts. Such solutions, making use of deep learning methods [166,167,168,169,170], have been proposed recently. What is more, if other SAR images, e.g. those representing riverbeds, were also used in the learning process, the approach could be applied more universally, not only for water body extraction, but also to segment riverbeds or identify flood areas. However, a representative set of data is necessary to address these subjects.