1 Introduction

Remote sensing has long been an important means for the mapping—and potential monitoring—of the surface water resources of the Earth (Palmer et al. 2015). Generally, for this purpose, data from all possible scales, i.e. from very-high-resolution UAV-borne data (Flener et al. 2013; Rivas Casado et al. 2015; Rusnak et al. 2018) to high- and medium-resolution satellite imagery (Isikdogan et al. 2017; Yang et al. 2017; Rishikeshan and Ramesh 2018), have been used. However, the greatest potential of spaceborne remote sensing lies in the possibility to observe surface water on large scales—from country-wide (Mueller et al. 2016) to even global coverage (Pekel et al. 2016).

The work presented in this contribution seeks to complement the existing large-scale approaches of Mueller et al. (2016) and Pekel et al. (2016). While they were based on information extraction from long-term optical archives and classification systems utilizing expert knowledge, in this contribution the focus lies on the following points:

  • While much of the available literature focuses on the utilization of only optical imagery or only SAR data for water mapping, in this work, the joint use of the multi-spectral optical data provided by Sentinel-2, as well as the synthetic aperture radar (SAR) observations provided by Sentinel-1 is emphasised. An important reason for this choice is the fact that SAR imagery does not suffer from obstruction by clouds or other atmospheric effects. It is, thus, theoretically allowing for a more continuous monitoring of important water bodies or wetlands (see, e.g. Zeng et al. 2017). Furthermore, the great potential of SAR-optical data fusion for water mapping was only recently confirmed by Bioresita et al. (2019) and Slinski et al. (2019). While the former focused on decision-level fusion of water probability maps calculated from mono-sensor Sentinel-1 and Sentinel-2 time series, the latter used pixel-level fusion of SAR backscatter and multi-temporal modified normalised difference water index (MNDWI) composites derived from Landsat observations. Study results for test scenes located in Ireland and Ethiopia, respectively, confirmed the robustness of those fusion-based approaches.

  • By utilising a simple classifier based on supervised learning and automated training data generation from globally available volunteered geographic information (VGI), freely available remote sensing data provided by the Sentinel missions of the European Copernicus program, and the cloud computation platform Google Earth Engine, the proposed approach can easily be applied to arbitrary regions of the world. It is assumed that this will enable also laypersons to map inland water on a large scale without the need for expert knowledge or manual intervention.

For the experimental investigations, the state of Bavaria is used as the study scene. The region is a typical representative of the temperate biome, located in the heart of Europe. Since Bavaria contains many kinds of possible inland water bodies, from small mountain creeks to large drainage rivers, from districts of small ponds to large freshwater lakes, the results of this work are expected to provide sufficient relevance to generalize beyond the study scene.

Thus, while using a central European region as an example, it has to be stressed that the main purpose of this work is to explore the general potential of a fully automatic, cloud-based framework that allows fast and cheap large-scale mapping of surface water for complete countries, continents, or even the whole globe.

In this context, it has to be stressed that the term “surface water” in this work refers to the very water extent that is visible in seasonal Sentinel-1 and Sentinel-2 composites (cf. Sect. 3.2.1).

The remainder of this article is organised as follows: Sect. 2 describes the Sentinel-1 and Sentinel-2 missions with a focus on system-inherent water mapping capabilities. In Sect. 3.2 a simple, yet fully automatic, water detection framework based on supervised learning is proposed, whose results for the study area of Bavaria are summarised in Sect. 4.

Finally, Sect. 5 discusses the quality of the presented water detection approach and possible sources of errors, while Sect. 6 presents summary and conclusion of the findings.

2 Sentinel-1 and Sentinel-2

The Copernicus program is the Earth Observation program of the European Union (Desnos et al. 2014). While it also provides data from in situ sources, its core is comprised of the Sentinel satellite family. These Sentinel missions were specifically designed to meet versatile end user requirements. Among the Sentinel missions, Sentinel-1 and Sentinel-2 are classical Earth observation platforms and provide SAR and multi-spectral image data, respectively, in the metre-resolution domain. Thus, they are certainly the most relevant for inland water body monitoring; whereas, Sentinel-3 is specifically aimed at ocean and land monitoring with its low spatial resolution multi-spectral images and altimeter capabilities.

In the frame of this study, which focuses on inland freshwater (i.e. river and lake) mapping, thus only Sentinel-1 and Sentinel-2 are considered, whose channels are summarised in Table 1, and whose attributes are described in the following.

Table 1 Channels and their meaning of Sentinel-1 and Sentinel-2

2.1 Sentinel-1

The Sentinel-1 mission (Torres et al. 2012) currently consists of two polar-orbiting satellites, which are equipped with C-band SAR sensors enabling them to acquire imagery regardless of metereological conditions.

Sentinel-1 works in a pre-programmed operation mode to avoid conflicts and to produce a consistent long-term data archive built for applications based on long time series. Depending on which SAR imaging mode is used, either high resolutions down to 5 m (at a swath width of 80 km) or wide coverages of up to 400 km (at a resolution of 20 m × 40 m) can be achieved.

Furthermore, Sentinel-1 provides dual polarisation capabilities and very short revisit times of about one week at the equator. Since highly precise spacecraft positions and attitudes are combined with the high accuracy of the range-based SAR imaging principle, Sentinel-1 images come with high out-of-the-box geolocation accuracy (Schubert et al. 2015). In the frame of the presented study, the most widely available standard product was used, i.e. ground-range-detected (GRD) imagery acquired in the so-called interferometric wide swath mode (IW). The GRD images display \(\sigma ^0\) backscatter coefficient values for both vertical (VV) and cross (VH) polarization in dB scale and are projected to ground range using the Earth ellipsoid model WGS84. For precise ortho-rectification, restituted orbit information is combined with the 30 m-SRTM-DEM or the ASTER DEM for high latitude regions where SRTM is not available. While the resolution of IW-GRD images is 20 m × 22 m in range and azimuth, respectively, the products are provided with a square pixel spacing of 10 m × 10 m to the end user. It has to be noted that SAR imagery is inherently affected by the so-called speckle effect, which appears as a form of noise to human analysts and most image processing algorithms, so that speckle filtering by dedicated algorithms or multi-temporal image fusion must be used during pre-processing to enhance the visual image quality.

2.2 Sentinel-2

The Sentinel-2 mission (Drusch et al. 2012) at the moment comprises two identical polar-orbiting satellites in the same orbit, phased at 180\(^\circ\) to each other. The mission is meant to provide continuity for multi-spectral imagery of the SPOT and LANDSAT kind, which have provided information about the land surfaces of our Earth for many years. With its wide swath width of up to 290 km and its high revisit time of 5 days over the inhabited continents (based on two satellites) under cloud-free conditions, the Sentinel-2 mission is specifically well suited to the monitoring of vegetated land surfaces. Sentinel-2 data are provided in the form of so-called granules, which cover a ground area of 100 km × 100 km at pixel spacings of 10 m, 20 m and 60 m, depending on the respective spectral bands. At the most widely available Level-1C standard, the end-user products provide orthorectified Top-Of-Atmosphere (TOA) reflectances at sub-pixel multispectral registration. Since cloud and land/water masks included in the products are rather coarse, these are often not very reliable, so that pre-processing is still necessary to deal with potential issues of cloud coverage.

2.3 Suitability of Sentinel Data for Water Mapping

The SAR data provided by the Sentinel-1 mission are expected to be a good source of information for water mapping based on the fact that to C-band radar signals water is a smooth surface. This leads to a mirror-like reflection and consequently very low backscatter values (i.e. “dark” pixels). On the other hand, smooth artificial surfaces (e.g. roads or other paved areas) can also lead to specular reflections causing confusion with water bodies and leading to a high false-positive rate, if only SAR data were used as input to the water detection procedure.

Besides these radiometric considerations, in the interpretation of SAR imagery geometric effects caused by the side-looking imaging principle need to be taken into account: if the top of an object (e.g. a mountain) is closer to the sensor than its bottom, the object appears to “collapse” towards the sensor. This is called the layover effect. In correspondence, any target lying behind this elevated object will be within the so-called radar shadow. Since rivers are often carving themselves into canyons, these two effects can impact the detection of rivers and lead to both false positives (in the case of radar shadow) and false negatives (in the case of a river hidden in layover).

The multi-spectral data provided by Sentinel-2 support the identification based on the interaction of electromagnetic radiation from the optical part of the spectrum with water: While for pure water only less than 2–3% of the incoming radiation of the visible domain are reflected, almost complete absorption in the near-infrared (NIR) and short-wave infrared (SWIR) bands is observed (Hobson and Williams 1971). On the other hand, in particular, shallow water areas are affected by differing penetration depths from the blue to the NIR wavelength regions (Bhargava and Mariam 1991). Therefore, water level fluctuations will have an impact on the robustness of water detection from multi-spectral data. Besides, also high turbidity values or floating vegetation can lead to mis-classifications with both datasets.

Taking these considerations into account, it is advisable to make use of data fusion strategies to mitigate the weaknesses of the two sensor principles while benefitting from the complementarity of the information contained in the two data sources (Schmitt and Zhu 2016).

3 Materials and Methods

3.1 The State of Bavaria as Study Area

For the presented investigations on the potential of Sentinel remote sensing data for large-scale inland freshwater mapping, the state of Bavaria is used as study area. Bavaria is a landlocked federal state of Germany, occupying an area of about 70,550 \(\hbox {km}^2\). From the surface water perspective, Bavaria is characterised by its location at the Alpine foothills with numerous lakes and small- to medium-sized rivers. In addition, two major rivers flow through the state: the Danube, which is Europe’s second longest river, drains about 48.200 \(\hbox {km}^2\) with a 380-km-long Bavarian section; while, the Main drains about 23.350 \(\hbox {km}^2\) on a 408-km-long stretch. The largest natural lakes belonging exclusively to Bavaria are Chiemsee with a surface of about 80 \(\hbox {km}^2\), Starnberger See with about 58 \(\hbox {km}^2\), and Ammersee with about 47 \(\hbox {km}^2\); whereas, the largest reservoirs are Forggensee with about 15 \(\hbox {km}^2\), Großer Brombachsee with about 9 \(\hbox {km}^2\), and Ismaninger Speichersee with about 6 \(\hbox {km}^2\). The mentioned water bodies are indicated in Fig. 1.

Fig. 1
figure 1

Map of the largest rivers and bodies of stagnant water in Bavaria (This work, “Largest water bodies of Bavaria”, is a derivative of “Bavaria relief location map” by Alexrk2, used under CC-SA. “Largest water bodies of Bavaria” is licensed under CC-SA by Michael Schmitt.). The coloured boxes indicate the different test areas shown in Figs. 6, 7, 8, 9, 10, 11, 12

3.2 Water Detection by Support Vector Machines

In spite of the massive success modern deep learning approaches showed also in the field of remote sensing in recent years (Zhu et al. 2017), it is argued that complicated, data-hungry convolutional neural networks are not necessary for water detection, as water can very clearly be identified even in simple hand-crafted feature spaces. An example for this argument is shown in Fig. 2. As numerous researchers have confirmed before, even with only SAR backscatter or the modified normalised difference water index (MNDWI) (Xu 2006), water and non-water pixels can already be distinguished to some extent, e.g. by thresholding techniques (Liebe et al. 2005; Conrad et al. 2016). When only slightly more complexity is introduced, i.e. when the feature space is extended to two or three dimensions, water becomes clearly separable from the non-water background class.

It has to be mentioned, however, that simple thresholding or linear discrimination procedures are not sufficient (see, e.g. Klein et al. 2014). As can be seen from Fig. 2, quite some overlapping background and water pixels exist. This is mainly due to the confusions described in Sect. 2.3. In the SAR case, background samples (e.g. paved areas) might appear dark and be confused with water, or water samples might appear bright and, thus, be confused with background due to roughness caused by waves or top-water vegetation. In the MNDWI case, bare earth or built-up areas with very high surface temperatures can lead to strong reflections in the short-wave infrared (SWIR) band and, thus, lead to high MNDWI magnitudes; whereas, smaller water bodies might be mixed with soil reflectance in the 20 m\(\times \,20\) m pixels of the Sentinel-2 SWIR band.

Fig. 2
figure 2

Two-dimensional feature space created from Sentinel-1 VV backscatter (normalised to [0; 1]) and Sentinel-2 MNDWI magnitudes for the discrimination of water and background

In this work, therefore, a simple support vector machine (SVM) with radial basis function (RBF) kernel is employed. Focus is put mainly on the following questions:

  • Which input information that can be derived from Sentinel-1 and/or Sentinel-2 data constitutes the best source for freshwater mapping?

  • Does the fusion of Sentinel-1 SAR and Sentinel-2 optical data support the water mapping task?

  • What are the data-inherent limitations of Sentinel-based freshwater mapping, e.g. with respect to granularity or spectral information content?

The application of SVMs for land cover classification has a long tradition in the remote sensing community (Mountrakis et al. 2011). In this contribution, a fully automatic, cloud-based workflow for SVM-based freshwater mapping of extensive areas is described.

The workflow is implemented in Google Earth Engine (Gorelick et al. 2017) to utilize its curated data catalogue and cloud-processing capabilities. The procedure is detailed in the following sections.

3.2.1 Data Pre-Processing

Both Sentinel-1 SAR imagery and Sentinel-2 multi-spectral data come with their own system-inherent limitations, which make a reliable deduction of mapping information from them more difficult. In the Sentinel-1 case, one is mainly dealing with the already mentioned speckle effect, which appears as a form of noise to most image analysis approaches. While manifold methods for speckle filtering exist, most of them degrade the quality of the underlying image contents to some extent. Therefore, another option is used, which is often applied if precisely aligned multi-temporal SAR images are available—the creation of a temporal mean map. For this purpose, all Sentinel-1 IW datasets acquired in a specified time frame (i.e. the summer months June, July and August in 2018) are loaded from the data catalogue. For every pixel, then the mean value of all corresponding acquisitions is calculated. Afterwards, the \(\sigma _0\) backscatter values are clipped to the interval \([-25;\,0]\) as a form of data calibration. The effect of this pre-processing is illustrated in Fig. 3. The example shows how especially finer structures, such as roads, or water bodies, e.g. located in the top left corner of the scene, appear much more clearly in the temporal mean map than in the single-date SAR image.

Fig. 3
figure 3

The denoising effect of temporal averaging: a single Sentinel-1 scene, b temporal mean map achieved by averaging 59 multi-temporal acquisitions acquired within a time period of 3 months during summer 2018

When it comes to Sentinel-2, the most severe problem is cloud cover. To mitigate information loss caused by cloud-affected pixels, different pre-processing approaches can be utilised (see, e.g. Schmitt et al. 2019). In the frame of the presented work, a simple standard method is used, which combines multi-temporal median filtering and the cloud mask provided with the Sentinel-2 products in form of a quality assurance band. In analogy to the Sentinel-1 case, all Sentinel-2 scenes with less than 30% of clouds (on granule level) acquired in the summer months of 2018 (i.e. June, July, and August) are collected from the data catalogue. Then, every pixel is masked, which is affected either by dense clouds or cirrus clouds (according to the QA60 quality band). Finally, the median for every pixel from all non-masked acquisitions is calculated, since the median is supposed to exclude very dark values (often caused by shadows) and very bright pixels (often caused by remaining clouds). The effect of this simple cloud masking is shown in Fig. 4. It becomes obvious that in single-date imagery, many scene parts are obstructed by cloud cover or cloud shadow; while, combining multi-temporal data allows to create a synthetic cloud-free input to subsequent water detection tasks.

While this proceeding is suitable for the creation of water maps at loosely sampled, discrete time points, it prevents an actual quasi-continuous monitoring of inland water at the temporal resolution that would be provided by the Sentinel-2 acquisition opportunities. In the future, the presented approach has to be extended by a more sophisticated cloud-mitigation procedure, which still makes use of SAR and optical data fusion instead of only relying on SAR data, but samples the optical imagery more densely in time. A quite inspiring technique tailored to the mapping of croplands from Sentinel-2 time series based on a deep neural network, which also learned to ignore clouds, was recently presented by Rußwurm and Körner (2018).

Fig. 4
figure 4

Result of a simple multi-temporal cloud removal approach: a single Sentinel-2 scene constructed by combining data from two neighbouring granules, b cloud-masked median image constructed from 20 multi-temporal acquisitions per granule, acquired within a time period of 3 months during summer 2018

3.2.2 Automatic Training Data Generation

Since the aim of this work is detecting water surfaces with a model based on supervised learning, a suitable training dataset is needed, which contains a balanced sample of target examples, i.e. water pixels, and clutter examples, i.e. background pixels. To train a robust and well-generalizing model, it is important that the versatility of water observations (ranging from deep lakes to shallow rivers) is well represented in the training data. This holds even more so for the background-related observations, as here one has to deal with many different forms of land cover. To avoid labour-intensive, and often biased, manual annotation of training samples, a fully automatic approach is proposed, which exploits voluntary geographic information (VGI) in the form of the OpenStreetMap (OSM) water layer.

After loading the layer, it is filtered to remove any polygons related to the wetlands class, so that only actual surface water bodies (i.e. rivers, lakes etc.) are kept. Then, a negative buffering of 20 m to each polygon is applied to make sure that each polygon really only contains water surfaces and not ambiguous areas such as shorelines. Finally, the list of polygons is filtered to remove all water polygons with an area smaller than 10,000 \(\hbox {m}^2\) for the sake of memory requirements and processing speed. In parallel, an inverse, i.e. positive, buffering of 50 m is applied to the previously reduced water polygons and those enlarged polygons are subtracted from the outer polygon describing the Bavarian state borders. Thus, one finally has one list of water polygons, and one list of non-water polygons. This allows to randomly sample 2500 training pixel locations each for both the water and the background class, as well as 500 pixels each for testing. From those locations, later the desired remote sensing-derived information are extracted to train and test the proposed SVM-based classification models.

3.2.3 Features for Water Mapping

From Sentinel-1, both the VV and VH polarisation channels are used. From Sentinel-2, all bands with 10-m and 20-m resolution are used, which carry highly relevant information from the visible, the near infrared, and the short-wave infrared parts of the electromagnetic spectrum. The 60-m bands are ignored, as those are primarily intended to provide measurements about atmosphere parameters such as aerosoles and clouds. In addition, two spectral indices are employed. The normalised difference vegetation index (NDVI) (Rouse 1974) was developed for mapping vegetation status and is probably the most frequently used spectral index in environmental remote sensing (Pettorelli 2013). It combines information from the red and near-infrared parts of the spectrum and is calculated by

$$\begin{aligned} \rm{ NDVI } = \frac{\rm{ NIR } - \rm{ Red }}{\rm{ NIR } + \rm{ Red }} = \frac{\rm{ Band } \,8 - \rm{ Band } \,4}{\rm{ Band } \,8 + \rm{ Band } \,4}. \end{aligned}$$
(1)

The NDVI is ued based on the assumption that its response in case of water is inverse to that of vegetation.

Table 2 Input feature combinations used in the experiments

In addition to that, the more relevant modified normalised difference water index (MNDWI) (Xu 2006) is employed, which is an extension of the normalised difference water index (NDWI) (Gao 1996) and is dedicated to the detection of water surfaces. It combines information from the green and short-wave infrared parts of the spectrum and is calculated by

$$\begin{aligned} \rm{ MNDWI } = \frac{\rm{ Green } - \rm{ SWIR }}{\rm{ Green } + \rm{ SWIR }} = \frac{\rm{ Band }\, 3 - \rm{ Band } \, 11}{\rm{ Band } \, 3 + \rm{ Band }\, 11}, \end{aligned}$$
(2)

whereas the original NDWI is calculated by

$$\begin{aligned} \rm{ NDWI } = \frac{\rm{ NIR } - \rm{ SWIR }}{\rm{ NIR } + \rm{ SWIR }} = \frac{\rm{ Band } \, 8 - \rm{ Band } \,{11 }}{\rm{ Band } \, 8 + \rm{ Band } \, 11 }. \end{aligned}$$
(3)

Since the MNDWI was developed to improve the sensitivity of the NDWI against a confusion of water with impervious surfaces, it is expected to be the most powerful feature for water detection from optical imagery. To illustrate the use of both NDVI and MNDWI, the corresponding feature images are displayed in Fig. 5. The illustration also shows the higher resolution of the NDVI, which is based on 10-m bands; whereas, the MNDWI uses one of the 20 m SWIR bands.

Fig. 5
figure 5

NDVI (a) and MNDWI (b) magnitude image examples. While there is a high inverse correlation of NDVI magnitude and water, also impervious surfaces and some (barren) agricultural areas exhibit low NDVI magnitudes. In contrast, high MNDWI magnitudes clearly correlate with water

Table 2 summarizes the different feature combinations used in the experiments for this work. To analyse the full potential of the available Sentinel-1 and Sentinel-2 data, the potential of Sentinel-1 only, Sentinel-2 only, and a fusion of both data sources is investigated.

4 Results for Rivers and Lakes in Bavaria

Different SVM models for the 10 feature combinations summarised in Table 2 were trained based on the 5,000 training samples as described in Sect. 3.2.2. Using tenfold cross-validation, the 10 models are evaluated on 1,000 test samples each. The results are summarised in Table 3. It has to be noted that in spite of cross-validation, both training sets and test sets are randomly sampled from the same basic distribution. Thus, the results should be considered as optimistic estimates for the study area of Bavaria and cannot be used to extrapolate accuracies to be expected in unseen areas. Furthermore, due to the automatic training data generation procedure, small water bodies are not represented in the ground truth; this also leads to positively biased values.

Table 3 Test metrics for the different feature combinations resulting from tenfold cross-validation

Looking at these quantitative results, one can observe the following phenomena:

  1. 1.

    Dual-pol SAR does not provide higher detection accuracy

    Accuracy-wise, the best SAR configuration for detecting water is a single-channel analysis of only VH-polarised intensity images (setup 2). This setup provides the best producer’s accuracy for water, while overall accuracy and kappa coefficient are not significantly different from the numerically better dual-pol result. VV polarization (setup 1) only performs better with respect to producer’s accuracy of non-water and user’s accuracy of water. This indicates that it misses some water areas, but does usually not lead to a misclassification of background. Dual polarisation (setup 3) does not provide significantly better metrics in any category, but is one of the results with the smallest standard deviations, i.e. the highest prediction stability.

  2. 2.

    Optical indices are not as helpful as expected

    While the MNDWI magnitude is at least the most powerful feature to reliably decide what pixels belong to the background (shown by high producer’s accuracy for non-water and user’s accuracy for water in setup 7), it does not perform very well in finding all water areas. One might think that this is caused by the fact that the SWIR bands necessary for MNDWI calculation have a resolution of only 20m, so that small water areas will simply be missed. However, due to the removal of small water bodies in the label pre-processing (cf. Sect. 3.2.2), the validation data also do not contain any small water bodies with an area less than \(10{,}000\,\hbox {m}^2\).

    In contrast to the MNDWI magnitude, the NDVI magnitude provides the overall least amount of useful information for water detection (see setup 6). At the same time, a combination of MNDWI and NDVI (setup 8) leads to a better but still mediocre water detection performance. Using only the 10-m bands of Sentinel-2 without any previous feature extraction performs best (see setup 4); whereas, using all surface-related bands (setup 5) provides no additional benefit besides slightly better prediction stability.

  3. 3.

    Fusion performs better than single-sensor analysis

    With the highest overall accuracies and at least competitive \(\kappa\) coefficients, the fusion-based predictions on average perform better than the single-sensor results. While the fusion of just VH polarisation and MNDWI (setup 11) provides the overall highest producer’s accuracy for water, the fusion of all available surface-related bands (setup 9) is better with respect to the user’s accuracy of the water class as well as overall accuracy and \(\kappa\) coefficient—at least if rounding errors are ignored. Interestingly, leaving out the Sentinel-2 20m bands and the Sentinel-1 VV channel (setup 10) unexpectedly does not provide any advantage, although they could potentially downgrade the result due to their lower resolution.

  4. 4.

    Detecting water is harder than detecting background

    The mean producer’s accuracy for water detection is about \(96\%\), whereas the mean producer’s accuracy for background is almost \(98\%\). This indicates that detecting the single, well-defined water class is harder than detecting background pixels, which inherently comprise a much more versatile sample distribution.

Based on these numerical results, selected visual results are shown for several example regions across Bavaria for

  • the best Sentinel-1 configuration, i.e. the VH band (setup 2)

  • the best Sentinel-2 configuration, i.e. the 10-m bands (setup 4)

  • the two best fusion-based configurations, i.e. all bands (setup 9) and VH+MNDWI (setup 11)

in terms of mean accuracy achieved in the tenfold cross-validation experiments.

The selected maps contain sections of the Danube (Fig. 6), the Main (Fig. 7), the Lech (Figs. 6 and 8), the Inn (Fig. 9), and the Isar (Fig. 10) to illustrate the performance on large- and medium-sized rivers. With respect to a mapping of bodies of stagnant water, Starnberger See, Ammersee, Wörthsee, Pilsensee (all Fig. 11), Chiemsee, Simsee (both Fig. 9), Forggensee, Weißensee, Hopfensee, Bannwaldsee (all Fig. 8), Großer Brombachsee, Altmühlsee (both Fig. 12), and Ismaninger Speichersee (Fig. 10) are shown to illustrate the performance on large- and medium-sized lakes and reservoirs.

For comparison to an independent yet similar reference, all figures contain a surface water map extracted from the Global Surface Water Layer (GSWL) (Pekel et al. 2016), which was generated by a multi-temporal analysis of Landsat imagery. The displayed reference map is an aggregation of the binary monthly GSWL maps for the summer months of 2015, i.e. when a pixel was identified as water in at least two out of the three months June, July, and August 2015, it is considered to contain surface water.

Fig. 6
figure 6

Example area showing the Danube river (flowing in west–east direction), with the Lech river entering from the south. a Optical reference image, b GSWL reference layer, c Sentinel-1 VH polarisation result, d Sentinel-2 10-m bands result, e Sentinel-1/Sentinel-2 full-band fusion, f Sentinel-1/Sentinel-2 VH/MNDWI fusion

Fig. 7
figure 7

Example area showing the Main river (flowing in east–west direction), with the Regnitz river entering from the south close to the city of Bamberg. a Optical reference image, b GSWL reference layer, c Sentinel-1 VH polarisation result, d Sentinel-2 10-m bands result, e Sentinel-1/Sentinel-2 full-band fusion, f Sentinel-1/Sentinel-2 VH/MNDWI fusion

Fig. 8
figure 8

Example area showing the reservoir Forggensee, some smaller natural lakes, and the Lech river exiting Forggensee to the north. a Optical reference image, b GSWL reference layer, c Sentinel-1 VH polarisation result, d Sentinel-2 10-m bands result, e Sentinel-1/Sentinel-2 full-band fusion, f Sentinel-1/Sentinel-2 VH/MNDWI fusion

Fig. 9
figure 9

Example area showing river Inn (flowing in south–north direction), the Simsee, and the Chiemsee, with the small Alz river exiting Chiemsee to the north. a Optical reference image, b GSWL reference layer, c Sentinel-1 VH polarisation result, d Sentinel-2 10-m bands result, e Sentinel-1/Sentinel-2 full-band fusion, f Sentinel-1/Sentinel-2 VH/MNDWI fusion

Fig. 10
figure 10

Example area showing the northeastern surroundings of the city of Munich with the river Isar (flowing in south–north direction), the reservoir Ismaninger Speichersee, and the artificial channel Isarkanal exiting the Speichersee to the east. a Optical reference image, b GSWL reference layer, c Sentinel-1 VH polarisation result, d Sentinel-2 10-m bands result, e Sentinel-1/Sentinel-2 full-band fusion, f Sentinel-1/Sentinel-2 VH/MNDWI fusion

Fig. 11
figure 11

Example area showing the “Fünfseenland” (with Starnberger See, Ammersee, Pilsensee, Wörthsee, and Weßlinger See). a Optical reference image, b GSWL reference layer, c Sentinel-1 VH polarisation result, d Sentinel-2 10-m bands result, e Sentinel-1/Sentinel-2 full-band fusion, f Sentinel-1/Sentinel-2 VH/MNDWI fusion

Fig. 12
figure 12

Example area showing the reservoirs Altmühlsee (including the artificial channel Altmühlzuleiter entering on the west) and Brombachsee. a Optical reference image, b GSWL reference layer, c Sentinel-1 VH polarisation result, d Sentinel-2 10-m bands result, e Sentinel-1/Sentinel-2 full-band fusion, f Sentinel-1/Sentinel-2 VH/MNDWI fusion

In these visual results, the following phenomena can be observed:

  • In the results that utilise Sentinel-1 SAR backscatter (subfigures c,e,f) very often false-positive detections occur. In Fig. 6, this happens on the eastern edge of the scene, which depicts a part of the bog land “Donaumoos”. In Figs. 7 and 12 the false positives are spread out through most parts of the scene for the cases in which only VH polarisation and VH polarisation plus MNDWI are used. And in Fig. 10, the SAR input causes a mis-classification of the Munich airport as water.

  • Interestingly, false-positive detections also occur in urban areas sometimes when only the 10-m bands of Sentinel-2 are used. This can be observed in Figs. 7d and 10d.

  • The continuously most robust results from a visual point of view are obtained when all surface-related bands from both Sentinel-1 and Sentinel-2 are fused (subfigures e), closely followed using only the 10-m bands of Sentinel-2 (subfigures d).

  • Several smaller rivers are not present in the GSWL layer, but are found by the all-band fusion (e.g. visible in Fig. 6e, southern part of the scene, Fig. 7e on the southern edge of the city of Bamberg or Fig. 10e with the Isarkanal running parallel to the river Isar in the eastern half of the scene.

  • The Forggensee (cf. Fig. 8) is not detected completely in terms of its cartographic outline. This is due to the fact that Forggensee is a man-made reservoir that is regularly emptied. This illustrates the fact that the proposed method only maps the observable water surface extent. It is interesting to note, though, that the models giving large weight to the VH-polarised SAR information (experiments 2 and 11, Fig. 8c, f, respectively) detect more water than the optical-only models.

5 Discussion

5.1 Quality of the Water Detection

Looking at both the quantitative as well as qualitative results summarised in Sect. 4, one can see that fully automatic water surface detection on a state-wide level is generally possible from Sentinel-1/2 data using a simple, SVM-based machine learning approach and OpenStreetMap data for training the classifier. While the accuracy metrics calculated from randomly distributed point samples indicate that the best configurations allow a water detection with up to \(98\%\) of producer’s and overall accuracy, the visual examples show that the numeric results have to be taken with a grain of salt. Due to the limited resolutions of the Sentinel data (from 10-m GSD to about 20-m GSD), smaller water bodies of course cannot be detected. Since they are not part of the OSM-derived reference data nor the GSWL water layer used for comparison, this is not reflected by the accuracy metrics. Especially setup 11 (Fusion of Sentinel-1 VH and Sentinel-2 MNDWI) shows a larger amount of missed smaller water bodies in comparison to the other results. This is caused by the fact that this configuration provides the overall lowest resolution (about 20m GSD for both channels). However, given the powerful detections provided by the optical 10-m bands, a significant improvement could possibly be expected if higher-resolution SAR or optical data were available. The works of Du et al. (2016) and Yang et al. (2017), which aimed at the generation of a super-resolved SWIR band for the calculation of a high-resolution MNDWI image, confirm the promising potential in this regard.

Apart from the sensor-inherent limitations with respect to GSD, the rate of missed water bodies is generally very low, i.e. a good level of completeness is achieved.

5.2 False-Positive Detections

In particular by looking at the visual results, it can be observed that some of the results suffer from a certain amount of false positives, e.g. by confusing an airport with water (e.g. in the test cases relying on Sentinel-1 data in Fig. 10). This is also reflected by the low producer’s accuracy for the non-water class when using both setup 2 (Sentinel-1 VH channel only) and setup 11 (Fusion of Sentinel-1 VH and Sentinel-2-derived MNDWI), indicating that while the Sentinel-1 information hardly misses any water body that is of sufficient size, it also produces a considerable amount of false positive detections.

This is mainly caused by the fact that for C-band radar signals also smooth impervious surfaces (e.g. pavement) leads to a mirror-like reflection and, thus, very low backscatter amplitudes. Interestingly, also an occasional mis-classification of urban areas as water is observed when the 10-m Sentinel-2 bands are used as inputs. This indicates that the visible plus broadband near-infrared bands (B2, B3, B4, B8) are not enough for a robust discriminative model trained with automatically generated training data as proposed in this contribution. Instead, the short-wave infrared information also provided by Sentinel-2 and the Sentinel-1 SAR backscatter turn out to be highly useful for robustifying the water detection—in spite of extreme cases such as the Munich airport (cf. Fig. 10).

This is confirmed by the impression that the overall best result is found to be setup 9, i.e. the straight-forward pixel-level fusion of all surface-related Sentinel-1 and Sentinel-2 bands, especially with respect to the robustness against false water detections. This is not only reflected by the accuracy measures provided in Table 3, but also by the visual results displayed in Figs. 6, 7, 8, 9, 10, 11, 12. Nevertheless, using only the 10-m bands of Sentinel-2 (i.e. setup 4) follows closely second with quite similar accuracy values and visual results, although the number of false positives is a bit higher. This indicates that fusion helps to enhance robustness to some extent; while, the detailed information provided by the 10-m bands is preserved.

5.3 Influence of Data Pre-Processing

Besides the findings discussed so far, one important further point has to be mentioned. Both the Sentinel-1 and Sentinel-2 data as utilised in this study were products of multi-temporal data fusion to mitigate speckle and cloud coverage effects (see Sect. 3.2.1). Therefore, the results must be seen as a kind of snapshot for a three-month observation period in the summer of 2018. If a denser monitoring of surface water extents is desired, other data processing strategies have to be used which employ, for example, speckle filtering for the SAR imagery, or sophisticated, cloud-robust techniques for time series analysis in the optical case. Alternatively, instead of a binary water/non-water map, one could seek to provide a map containing frequencies of water coverage as did (Pekel et al. 2016), whose goal was not to map surface water in a static manner, but rather monitor its dynamics as observed by optical Landsat image time series. As mentioned in Sect. 4, the static GWSL maps as displayed in Figs. 6, 7, 8, 9, 10, 11, 12 for reference, however, are the result of an aggregation of binary monthly water maps acquired over a three-month period in summer 2018 for the sake of comparability. As can be seen from the results presented in this contribution, using Sentinel data and an automatically trained supervised learning approach similarly robust maps with even slightly more details can be produced. This is particularly interesting, as (Pekel et al. 2016) relied on a more complicated expert system classifier, which additionally used different sources of auxiliary data (e.g. digital elevation models, glacier data, urban area data and lava masks) to robustify the water detection.

Finally, it has to be noted that both approaches rely on multi-temporal data fusion for surface water detection, so it has to be accepted that the resulting water map is neither a representation of a single time snapshot, nor following a technical definition of the term water body. That is, an area, which is mostly experiencing low-water conditions during the time frame used in the analysis (as, e.g. the southern end of the Forggensee reservoir in Fig. 8), will not necessarily be considered as part of the water body—although it might be in a hydrological sense.

5.4 Label Noise

The last point to discuss is the effect of using the OSM water layer polygons as labels, especially if pre-processed as described in Sect. 3.2.2. Most importantly, it has to be mentioned that, of course, this layer cannot be considered as ground truth. There are various reasons for that: On the one hand, water bodies often change their area or shape over time, which is not always reflected in OSM. On the other hand, the OSM quality is not homogeneously high in all regions of the world, which can lead to a significant amount of label noise in the form of missed or wrongly mapped water bodies. Nevertheless, the advantage of using VGI such as OSM data for the automatic training data generation has to be highlighted. The main reason to use VGI instead of officially available geodata as provided by local governments is this work’s goal to provide a simple water mapping tool that is potentially applicable to extended regions (e.g. whole continents). While for many regions across the world at least some amount of OSM information exists, governmental land cover information is often not available. As an alternative to OSM data, one could also consider the GWSL to be used to generate the training labels.

6 Summary and Conclusion

In this work, a simple procedure based on supervised learning via support vector machines and automatic training data annotation with information derived from OpenStreetMap for the large-scale detection of inland water bodies was proposed. The performance of the approach for the State of Bavaria was illustrated and example mapping results for the major rivers, lakes, and reservoirs in that state were provided. Both qualitative and quantitative results confirm that fully automatic large-scale water surface detection is generally possible. The best configuration of input data seems to be a pixel-based fusion of all surface-related Sentinel-1 and Sentinel-2 bands for this purpose, providing an acceptable trade-off between detection accuracy and a certain robustness against false positives. Nevertheless, for operational mapping, post-processing against remaining false positives, e.g. by including elevation data, is advisable. Besides, the proposed framework should be combined with dedicated signal processing algorithms for pre-processing, e.g. filters for SAR image despeckling, or single-image cloud-removal techniques. When this is done, an approximately weekly monitoring of inland water becomes imaginable, as the need to fuse data from a three-month period for speckle reduction and cloud-removal will be removed.