Introduction

Human pressures to the marine biome have reached unprecedented extents. Today, globally up to 41% of marine habitats are directly impacted by a multitude of anthropogenic stressors (Halpern et al. 2008). Changes in seafloor substrate composition and spatial configuration may occur as a result of such anthropogenic pressure, but also of natural variability driven by varying hydro-meteorological conditions (van Denderen et al. 2015). Our ability to monitor the spatio-temporal dynamics of the seafloor and, ultimately, to relate the observed patterns to driving processes is central to our understanding of marine ecosystems and to the tutelage of the ecosystem services we depend on. This is also recognized in the European Marine Strategy Framework Directive (MSFD—EC 2008-56-EC) in which the seafloor is the backbone of several indicators of ‘Good Environmental Status’. For this purpose, seabed mapping, and particularly multibeam echosounding are increasingly used.

High-frequency multibeam echosounders (MBES) are considered as the state-of-the-art sonar instruments and are employed by commercial, governmental (i.e. hydrographic services), industry (e.g. oil and gas exploration and exploitation), and research institutions. This is due to the MBES ability to co-register high-density echo time, geometrical features and intensity over large seabed swaths, hence providing depth and intensity data (Kenny et al. 2003). While up until now the bathymetry has been the main focus of hydrographic surveys and mapping programs (i.e. following International Hydrographic Organisation standards of acquisition and accuracy of depth measurements; Wells and Monahan 2002), seafloor reflectivity (backscattered intensity from the seafloor) has only recently attracted interest from a scientific perspective due to its ability to map the water-sediment-interface constituency (Lurton and Lamarche 2015). Mapping this interface over vast areas allows extending information from isolated point locations (in-situ measurements such as grab samples and video observations) to the spatial extent of a digital surface. Moreover, if time series of acoustic data are acquired, it allows the application of change detection methods as developed in the terrestrial sciences with satellite data (e.g. Foody 2002; Pontius et al. 2004; Hussain et al. 2013). This raises the possibility to measure how much the attributes of a particular area have changed between two or more periods.

Despite the increasing interest of using MBES backscatter, standards of seabed backscatter acquisition and processing are still under development. A set of guidelines and recommendations was developed by the Backscatter Working Group (or BSWG; see http://geohab.org/bswg) mandated by the Geological and Biological Marine Habitat Mapping scientific committee (GEOHAB). Reaching standardisation of MBES data acquisition and processing procedures is challenging due to the number of manufacturers, multibeam models and dedicated processing platforms, each implementing their own processing algorithms and proprietary software features.

This paper addresses the application of change detection methods to capture seafloor substrate changes over a period of 10 years based on a time series of seven datasets of MBES depth and backscatter data (2004–2015). It relates to assessing good environmental status of gravel beds in the Belgian part of the North Sea (BPNS) for which the Belgian State specified two indicators on seafloor integrity (MSFD descriptor 6) and for which multibeam technology was put forward as monitoring tool (Belgian State 2012):

  1. 1.

    The areal extent and distribution of the European Nature Information System (EUNIS) level 3 Habitats (sandy mud to mud; muddy sand to sand and coarse sediments), as well as of the gravel beds, remain within the margin of uncertainty of the sediment distribution with reference to the Initial Assessment.

  2. 2.

    Specifically related to the gravel beds it is furthermore specified that the ratio of the hard (gravel) substrate surface area to the soft (sand) substrate surface area must not show a negative trend.

The case study is located within a sandbank system in a Habitat Directive Area of the BPNS. While of high ecological relevance, this area is intensively fished and marine aggregate extraction started in 2012 near its northern limit. In this paper a methodological framework is presented to assess progress of good environmental status based on multibeam backscatter data. Whilst developed at a local scale, the change detection methodology is promising to be applied on a more regional North Sea level.

Study area

The study site is approximately 8 km2 and is located in the proximity of the Western Border of the BPNS, more specifically in the Vlaamse Banken Habitat Directive Area (enacted as of 16th October 2012, EC 92/43/EEC; Fig. 1, grey-shaded polygon). It is located in the southern part of a complex sandbank-dune system named the Hinder Banks. Depths range from −8 to −30 m lowest astronomical tide (LAT). Fine to medium sands dominate the sandbank portion of this environment where large and very-large dunes (ranging from 4 to >10 m height) are present (sensu Ashley 1990). The western flank of the main sandbank body forms a transitional area between the bank sandy environment and the adjacent gully. In the latter, medium to coarse sand as well as gravel occur. Gravel provides small-scale structural complexity for ecological successional phases to occur (e.g. deposition of current advected larvae; Houziaux et al. 2007). Seabed maps indicate that the system is very poorly enriched by silt (0–1% silt–clay content; Verfaillie et al. 2006). A series of steep barchanoid dunes is present in the transitional area, with considerable amounts of gravel in the troughs (Van Lancker 2017). Diverse assemblages of sessile and vagile epifauna and benthic fish were observed here in pioneering and more recent studies (Houziaux et al. 2011, and references therein). Hereafter, these are called gravel refugia, since in the majority of the gully epifaunal growth on gravel beds is absent because of severe bottom-trawling occurring since the late 1800s. In gravel areas, these are known to routinely remobilise the gravel clasts (Jones 1992).

Fig. 1
figure 1

Left Belgian Part of the North Sea (BPNS). Right backscatter (dB) map of the study area with black outline polygons indicating biodiversity rich areas selected as case studies to monitor seafloor integrity

Since 2012, a new anthropogenic stressor was introduced in the area, related to sand extraction occurring 2.5 km NE of the Habitat Directive Area. Depending on timing, frequency and amount of extraction and hydrodynamic settings, resuspension of sediment plumes could represent a source of smothering leading to loss of surficial complexity and burial of epifaunal colonies (Thrush and Dayton 2002; Van Lancker et al. 2010; Spearman 2015). To assess environmental impacts, a monitoring programme was set-up combining multibeam recordings with seabed sampling, visual observations and water column measurements as well as hydrodynamic and sediment transport modelling (Van Lancker et al. 2016). Sediment plumes arising from the marine aggregate extraction activities, and their deposition, were depicted in acoustic imagery (Van Lancker and Baeye 2015), and numerical modelling results showed that their deposits reach the gravel beds in the Habitat Directive Area up to the study site (Van Lancker et al. 2016). The cumulative volume of marine aggregates extracted throughout the duration of the data time series is shown in Fig. 2: larger quantities were extracted from 2012 onwards (~800,000 m3) to reach a maximum of ~2.4 × 106 m3 in 2014.

Fig. 2
figure 2

Extracted marine aggregate volume in Mm3 from Extraction Zone 4 (2.5 km away from the designated area). See Mathys et al. (2011), Van Lancker et al. (2016) for a detailed description on the marine aggregate extraction in this particular area. Effective extraction began in 2012. Data on extraction volumes were provided by the Belgian Federal Public Service Economy, Continental Shelf

Methods

The “Methods” presents the acoustic and ground-truth data acquisition and processing, comprising a two-dimensional characterization of the spatio-temporal morphological evolution of the seafloor using the bathymetry data and a change detection analysis carried out on the backscatter time series. The steps of the analysis preceding the change detection include the application of supervised and unsupervised classification algorithms and their quantitative comparison. Finally, the change detection using backscatter data is carried out by using both classified (thematic/labelled) and unclassified (relative dB values/unlabelled) backscatter mosaics, as well as applying ensemble approaches.

Data acquisition and processing

Acquisition

The MBES data were acquired by Ghent University in 2004, and later by the Operational Directorate of Natural Environment of the Royal Belgian Institute of Natural Sciences as part of a sand- and gravel-extraction monitoring programme and MSFD-oriented monitoring campaigns (Van Lancker et al. 2016). Of the eight acoustic surveys undertaken between 2004 and 2015, seven were kept for this investigation. Surveys exhibiting a significant amount of navigation artefacts (mostly due to failure in vessel-motion related compensation during rough-sea conditions), were considered unsuitable for the analysis and were discarded. The first survey used a 100-kHz Kongsberg EM1002S, and the remaining six surveys operated a 300-kHz Kongsberg EM3002D (Dual-head system). Both systems were installed on Belgian oceanographic vessel R/V Belgica.

The hydrographic quality of the EM3002D dataset is consistent with the IHO S44 Special Order, whereas with the former EM1002S only the Order 1A (Wells and Monahan 2002) was attained. Under these standards, the total vertical uncertainty with ±95% confidence levels of the depth measurements result in ±0.63 and 0.33 m vertical error for the EM1002S and EM3002D, respectively, for a depth of 30 m (Tables 1, 2). These intervals encompass all sources of errors originating from the suite of instrumentation used during acquisition.

Table 1 EM3002D MBES specifications and auxiliary sensors
Table 2 Time-series dataset specifications

Pitch, roll, heave and yaw were automatically compensated for during acquisition and a sound velocimeter constantly monitored the sound velocity at the transducers. Survey lines were spaced to reach a good compromise between survey time/costs and quality of the data resulting in a minimum of 20% across-swath overlap between adjacent lines. Throughout the timespan of acquisition (inter- and intra-survey), the MBES settings controlled by the on-board software (i.e. SIS: Kongsberg native acquisition platform) remained unchanged (i.e. pulse length, beam aperture, beam spacing). The state of the antenna transducers was thoroughly checked and maintained for biofouling and deterioration of its components (either by divers or during regular dry-dock operations). Similarly, across all surveys, track lines were sailed in a SW-NE direction. Maintaining operational parameters stable and checking the physical state of the instrument ensured that instrumental drift was kept to the minimum.

Regarding sound absorption throughout the water column, the α coefficient (see Francois and Garrison 1982) was computed according to the local seawater properties at the surface which were fed into the acquisition system every half an hour. The necessary water medium environmental parameters were obtained from the On board Data Acquisition System (ODAS), which logs these data at 1-s intervals. No vertical profiles of the seawater properties were acquired since in this region the water mass is known to be well mixed throughout the year and no stratification is expected to occur (Luyten et al. 2003; van Leeuwen et al. 2015) and the surface values are considered to be sufficiently representative.

To verify instrumental drift on the medium to long term and allow comparison of backscatter levels in time, data were verified against an area with stable depth and backscatter levels (‘KWGS’ reference area, blue rectangle in Fig. 1). This calibration area (1.8 km2) is located in a gully in-between two sandbanks and is dominated by sand to sandy Gravel. These verifications showed that the oblique incidence backscatter [beam angle sector ±(35°–45°) and ±(0°–70°) for the full angular range] mean values remained, per survey, within 1 dB around the overall mean BS level with no significant trend that would suggest instrumental drift.

MBES data processing

The backscatter strength (BS) quantifies the amount of acoustic intensity scattered back to the sonar receiver following a complex interaction of the transmitted signal with the seafloor. It is the result of an intricate combination of several physical factors: the seawater-seafloor impedance contrast, the interface roughness and the sediment volume heterogeneity, the signal incidence angle on the seafloor and the acoustical signal frequency (Lurton 2010). Due to the various scattering properties of different seafloor substrates, backscatter can help determine bottom type (e.g. de Moustier and Alexandrou 1991; Hughes-Clarke et al. 1996; Ferrini and Flood 2006) and possibly to infer some of its physical characteristics.

However, backscatter data are inherently noisy, showing strong amplitude fluctuations due to the very nature of the scattering process (Lurton 2010), and the possible presence of additive external noise: a first processing stage is to reduce this random fluctuating character by appropriate filtering techniques. A second category of processing aims to correct geometrical artefacts resulting from the characteristics of instrumentation used in the acquisition (i.e. motion and positioning sensors), the seabed geometrical configuration (dictated by the local topography), the velocity and absorption properties of the water medium within which sound is travelling, and the angle of incidence (Lurton and Lamarche 2015). The observed angular response of seafloor backscatter (describing how the reflectivity impact upon echo intensity varies with the incidence angle) can be categorised into three distinct angle sectors. Each are characterized by a different scattering regime (i.e. the specular or near-nadir, the oblique and the low-grazing angle regime), hence they can be treated as separate entities (i.e. statistical populations) (Lurton 2010). In order to produce a sedimentological meaningful image and avoid the along-track banding effect of the three domains, the resulting angular dependence must be compensated. Consequently, the backscatter strength has to be normalised to a conventional reference angle (ideally in the 30°–60° range, but typically 45° is used).

Furthermore, several correction must be applied to the data, in order to account for the sonar sensor’s responses: source levels and pulse length; acoustic transmission losses due to spreading and absorption; 3-D beam directivity patterns; sensitivity of the receiving arrays and electronics; and real-time time varying gain (TVG) corrections applied by the sounder. These various points were addressed in the real-time data reduction scheme applied in Kongsberg Maritime echosounders and during acquisition (Hammerstad 2000).

To allow consistency in the last phases of the data processing (i.e. mosaic production) and hence enable their subsequent inter-comparability (in terms of relative dB values expressing a reflectivity scale according to a common reference), the EM3002D data were subject to a standardised processing procedure following the BSWG recommendations (see Lurton and Lamarche 2015). Fledermaus Geocoder (FMGT, v7.4.5.b) and QPS QIMERA (v1.2.4.429a) software suites were used to process the MBES raw data. Initially, tide-corrected bathymetry was produced and exported as 1-m horizontal resolution raster (32-bit float files) and as sound density files for integration in FMGT. The bathymetric surfaces are used to correctly allocate the backscatter snippet traces from single pings to their true seabed position. Each survey was normalised by applying a flat angle varied gain (AVG) filter with a window size of 300. In order to weight nadir pixels and reduce their banding effect, the “No Nadir if Possible 2” algorithm and “50% line blending” FMGT options were applied.

As such, the final dataset consisted of (1) relative (standardised to a common reference surface area) backscatter reflectivity (in dB), and (2) bathymetric surfaces (m) at 1 m horizontal resolution. The EM1002S data did not prove to be comparable in terms of backscatter levels with those from the EM3002D system, due to the differing intrinsic properties of the sensors (i.e. electronics and hardware) and to the absence of a cross-calibration of both sensors. Consequently, the first campaign was not included in the pre- and ensemble classification analyses (in “Pre-classification” and “Ensemble approach classification”).

Ground-truth data

The ground-truth data used in this study were acquired in complement to the T7 survey. Collection of ground truth is necessary to validate the assumptions developed during the observation of acoustic data and ultimately to derive confidence metrics expressing the validity of the map produced. Ten samples were collected using a Van Veen grab, each with three replicas to ensure the spatial consistency of the acoustic theme being sampled.

Video samples were acquired by means of a drop-frame, equipped with underwater lights and a camera with a 1 × 1 m field of view. Video-frame data with poor visibility (i.e. due to turbidity or too strong current) were discarded. Visual sampling was very useful to acquire data in the gravel areas where conventional gears failed (i.e. box core and Van Veen). All sample coordinates were corrected for the DGPS antenna layback accounting for the main source of positional error and were mapped with a 10 m buffer.

Sample types were described by combining visual and expert observations with grain-size parameters calculated by a MALVERN Mastersizer 3000 instrument. To validate the consistency in terms of sediment classification versus backscatter levels, the classes’ description was compared to previous substrate classification studies within the same area (Roche 2002).

Only features visible at the seafloor were described and classified into three thematic classes summarizing the main substrate composition: (1) homogeneous well-sorted fine to medium sand (fS); (2) moderately sorted medium sand with bioclastic detritus (mS + b); and (3) medium-to-coarse sand with gravel clusters (cS + G; Fig. 3).

Fig. 3
figure 3

Backscatter mosaic, ground-validation sample picture, textural detail and class description for the identified substrate classes

As will be shown later (see Fig. 5 in “Supervised map of the study area”), the fS and mS + b classes are texturally and sedimentologically similar with an overlap in terms of dB ranges, mS + b being a subset of the fS class. This is likely explained by the presence of bioclastic detritus and a significant roughness in the mS + b class which lead to interface scattering having a significant contribution to the overall acoustic return and causing a relatively high level (≈−27 dB) of mean backscatter. On the contrary, the fS class, which is almost entirely distributed on top of the sandbank (in the most dynamic part of the study area, likely with a higher water content than the flank and gully areas) is very well sorted and homogenous, with little interface roughness and no surface scatterers, resulting in the lowest values (≈−31 dB) of mean backscatter (Fig. 5). Conversely, the cS + G class features the highest content of coarse material with sparse individual strong scatterers, and high roughness at the interface; hence it corresponds to the highest values (≈−22 dB) of mean backscatter (Fig. 5). The described samples were separated into training (2/3) and validation (1/3) sets (Table 3). Sample representativeness was assessed visually by plotting the backscatter cumulative frequency distribution of the study area and for the mean backscatter values extracted within a 10 m buffer at the samples’ locations.

Table 3 Summary of sample sets used (fS fine homogenous sand, mS + b fine to medium sand with bioclastic detritus, cS + G medium to coarse sand with gravel clusters, VV Van Veen grab sampler)

Morphological evolution

At first, an assessment of the spatio-temporal morphological evolution is carried out to determine whether changes in substrate are due to morphological evolution (i.e. migrating dunes), to an actual reconfiguration of the substrate delineations or to a combination of both. Regions of interest (ROI) encompassing the main morphological and substrate features of the study area were selected to extract 2D profiles from the time series (see Fig. 4 for profile locations). Simple yes/no and quantitative metrics of change with information about the directionality (i.e. ebb or flood dominated bedforms) of the migration can be derived from here. For ease of interpretation, data from 2004 to 2015 were used only (T1 and T7, Table 2).

Fig. 4
figure 4

Location of the 2D profiles selected for the analysis of morphological evolution

Supervised classification

The second phase of the analysis makes use of the most recent (T7) acoustic survey for which complementary ground-truth data are available. In order to efficiently combine the two datasets, a supervised classification algorithm is used. Unlike an unsupervised method, where no a priori information about the class labels is provided to the algorithm (i.e. clustering procedures), supervised classification uses ground-truth information to train and test the classification results.

The Random Forest (RF; Breiman 2001) algorithm was used for classification. RF has high predictive accuracy in studies focusing on the comparison of supervised classifications of MBES data (Diesing et al. 2014; Diesing and Stephens 2015) and have proven highly successful in data mining research (Li et al. 2016). As explained in Diesing et al. (2014) and Li et al. (2016), the main underlying assumption of this method is that the predictive power of multiple classification trees (the elemental unit of machine learning methods) is higher than that of a single tree. Bootstrapped samples from the training data are used to construct the individual trees in the forest introducing the first element of randomness. In turn, a random subset of the predictor features is used at the node splits throughout the construction of the model. The result is the construction of unique trees. Decisions about the class allocation (labelling) are made on the basis of majority votes of individual trees. After a feature selection procedure, the RF was run growing 501 trees and leaving the parameters as default. The routine was implemented in R (R Development Core Team 2008) using the RandomForest package (Liaw and Wiener 2002).

Feature selection

A set of textural and morphometric predictor layers were computed from multibeam depth and backscatter grids (Table 4). Predictor layers are a set of variables (in this analysis terrain and texture attributes) derived from the MBES backscatter and bathymetry which are combined to the observed substrate type points (response variable) to predict the full-coverage seafloor map (Lecours et al. 2016). The relevance of the predictors was investigated by following the feature selection procedure provided by Kursa and Rudnicki (2010) using the Boruta RF wrapper function. Boruta identifies important variables by performing multiple runs of the RF classification (a total of 1000 runs were performed here) and by comparing the RF Z-scores of the original variables with the scores of their permuted copies (shadow variables). The Z-score is a measure expressing how many standard deviations a score stands from the mean. Higher importance is attributed when the mean Z-score of a variable after n runs is significantly higher than z-scores produced by the shadow variables.

Table 4 Predictor variables dataset with their description

Model evaluation

Overall accuracy (A) and Kappa (K) accuracy metrics were derived using the contingency table which cross-tabulates test and predicted instances (Foody 2004). Global accuracy provides a metric expressing the amount of correctly labelled pixels by the classifier whereas Cohen’s Kappa reflects the difference between the overall agreement and the agreement expected by chance.

Comparison of thematic maps

Since the supervised information is to be extended to the broader time series of acoustic data for which there is no ground-validation data, an analysis similar to that of Ierodiaconou et al. (2005), in which supervised and unsupervised classifications are compared and evaluated for similarity, was applied. In this paper, K means was chosen as an unsupervised classification method due to its success in finding optimal clustering solutions and after comparing the RF classification to an array of unsupervised classifiers. Hartigan and Wong (1979) algorithm was implemented using the R base functions (R Development Core Team 2008). Given a certain number of classes, the method seeks to reduce and maximise the within and between classes variance respectively by iteratively grouping similar points in their feature space. To validate the application of an unsupervised classifier, paired-pixel metrics of map agreements were computed after Foody (2004), Pontius and Millones (2011), and Pontius and Santacruz (2014). The R package diffeR was used (Pontius and Santacruz 2015). Components of allocation are used to derive the agreement between maps at the level of the entire landscape and per category. Quantity and Allocation describe the amount of change that is respectively due to the proportion of categories between reference and test instances and to the amount of spatial mismatch between categories.

Change detection

Three types of analysis were performed on the backscatter time series in order to extract trends and patterns of change in substrate classes: pre-, post- and ensemble-methods classification. Ensemble approaches combine supervised and unsupervised classifiers, whereas a pre-classification method focuses on the unclassified data values (similarly to directly relying on spectral bands in satellite imagery). The aim of a post-classification approach is to allocate class-labels to the data values to produce thematic maps.

Pre-classification

The pre-classification approach uses backscatter values taken from rectangular bins of the sampling locations representative of the different geomorphological and substrate features of the ROIs. Following, basic statistics and temporal trends were studied (for example, fluctuations around the ±1 dB accuracy threshold; Hammerstad 2000). In order to detect outliers in the time series, sigma detections where chosen as the favoured statistical measure to quantify the dispersion of a set of data values.

Ensemble approach classification

An ensemble method, combining supervised and unsupervised classifications was also applied. K-means classes (dB ranges) identified in T7 were used to reclassify the complete dataset for which ground-truth data were not available. From this classified dataset, proportion counts were extracted to observe temporal trends. Prior to transforming the successional backscatter mosaics into classified data, the Within Group Sum of Squared Distances plot was computed independently for each dataset. This ensured that the number of classes in each time series was maintained, and allowed testing. This also serves to test the class discrimination potential of data gathered at 100 and 300 kHz from the EM1002S and EM3002D, respectively. This technique is similar to computing a silhouette plot where the optimal number and size of classes in a dataset becomes visible (Eleftherakis et al. 2012).

Post-classification

The post-classification approach made use of the most commonly employed tool in change detection used in remote sensing studies: the transition matrix (Pontius et al. 2004; Braimoh 2006; Rattray et al. 2013). In this analysis, two unsupervised seafloor maps (e.g. prior and after a natural or anthropogenic event) are cross tabulated to derive detailed statistics describing the temporal changes. Persistence and class swap dynamics, gross gains and losses, between time and between classes’ transitions, as well as persistence ratios expressing the tendency of a category to undergo a certain change process were derived after Braimoh (2006). Swap is defined as the change in spatial location of a substrate type between times. The net change describes the difference in quantity of a substrate class between time 1 and time 2. Thus, swap describes changes in location, whereas net change reports changes in quantity. Gain and Loss describe an increase and decrease of the areal extent of a substrate class respectively. Gain (G p ), Loss (L p ) and Net (N p ) to persistence ratios are derived as a measure of class tendency to the different types of transition. Values above 1 indicate that a class is more likely to gain or lose from other classes rather than persisting across the time scale analysed. Values close to 0 indicate little or absence of change. The net change to persistence ratio, Np, indicates the overall trend of a category with negative and positive values indicating the directionality of the temporal trends.

Results

Firstly, results are presented on the supervised classification achieved by implementing the Random Forest algorithm. Secondly, the supervised model is compared to the map of the study area produced by the unsupervised clustering method. Next, the results from the two-dimensional morphological analysis are provided, followed by the change detection approaches tested on the time-series backscatter dataset.

Supervised map of the study area

Figure 5 shows the visual validation of the sample datasets. This showed an overall good representation of the BS variation in the study area (Fig. 5a). Mean backscatter values, extracted within 10 m circular buffers at the sample’s positions, indicate good separation of the classes (Fig. 5b) where coarse-hard and fine-soft classes exhibit the highest and lower backscatter values respectively. Similarly, the separation using the bathymetry evidences the distribution of substrate types within different depth zones (i.e. fS on the top of the sandbank, mS + b transiting to the deepest area, and cS + G in the gully; Fig. 5c, d). The predicted substrate classes’ distribution by the Random Forest supervised classification is shown in Fig. 6b.

Fig. 5
figure 5

a Backscatter distribution in the study area, and per sample dataset (ST1533-T7 dataset), b boxplot of mean backscatter extracted from a 10 m buffer at the ground-truth locations, c same as (a) using depth, d same as (b) using depth. Training and test refer to the distributions of the training and validation sample datasets used in the RF classification

Fig. 6
figure 6

a K means unsupervised classification, b Random Forest supervised classification and c map of overall agreement between classifications

The most important variables selected by the feature selection tool were BS, BS 3 × 3 mean filter, BS Local Moran and bathymetry. With these selected features, the map produced has an overall accuracy (A) of 81%. Furthermore, more than 70% of the classification did not occur by chance (k = 73%).

Comparison between supervised and unsupervised models

Figure 6 shows the visual agreement between supervised and unsupervised classifications, while agreement metrics between these models are summarised in Table 5. Overall, agreement is high with an overall quantity and allocation difference <10%. In terms of quantity, classes differ by an overall of 0.42%. The larger differences result as allocation disagreement of 9.47 and 8.16% for mS + b and cS + G classes respectively. The fS class is by far experiencing the highest between-map agreement (Table 5) with 1.1 and 0.42 differences in allocation and quantity respectively.

Table 5 Components of difference, allocation and quantity, between models predicted by the Random Forest and K-means (pixels in percentage)

Morphological changes

To characterize the dynamics over the full period, depth profiles were extracted from the ROIs for 2004 (T1) and 2015 (T7; Fig. 7). Within the barchanoid dunes and along the top sand bank areas (Fig. 7a, b respectively) horizontal migration accounts for up to ≈40 m with a SW-NE directionality. Considering the in-between surveys, it is possible to observe a progressive migration, advancing of ca. 20 m from 2004 to 2010, ca. 10 m from 2010 to 2013 and less than 5 m progressively throughout the remaining surveys up until late 2015. Within the relatively flat and gravel-populated areas (Fig. 7c, d, f), devoid of dunes, the seabed shows an overall stability. In these areas, vertical changes or aggradation was observed, but cannot be confirmed as they remain within the IHO Order S and 1A confidence envelopes. Nonetheless, a loss of profile complexity is observed between the two campaigns.

Fig. 7
figure 7

Depth profiles extracted from the digital elevation model time series. a Barchanoid dunes area, b top sand bank, c gully area, d gravel refugium 2, and e gravel refugium 3. Blue and red envelopes in d, e are the ±IHO confidence intervals for the EM1002S and EM3002D surveys respectively

Change detection

Pre-classification

The boxplot analysis carried out by extracting backscatter data from the selected ROIs is shown in Fig. 8. Excluding the EM1002S data (not comparable in terms of insonification values), no significant trends are observable with the exception of zones A and C (transitional and gully zones of the study area) which exhibit deviations >1 σ and generally a decreasing trend up until late 2014 (T5). Noticeably, all selected regions follow an overall elliptical trend (visible in Fig. 8h) and re-establish to the initial state of February 2010 (T2) by December 2015. Throughout all cases, the spread is lower than 1 dB evidencing no statistically significant changes. Testing this hypothesis, the reduced χ2 test computed within each region shows that a significantly negative trend in backscatter spatio-temporal behaviour does not exist (χ2\(\ll\) 1).

Fig. 8
figure 8

Boxplot analysis for the entire time series (T1–T7). Mean and standard deviation values were calculated from the EM3002D dataset only. a Barchanoid dunes area, b top sand bank, c gully area, d gravel refugium 1, e gravel refugium 3, f gravel refugium 4, g entire study area, and h mean backscatter values for the EM3002 time series (T2–T7), within each ROI. For ag red and blue dotted lines represent weighted mean and ±1 σ error respectively. For the ROIs location the reader is referred to Fig. 1 (AF boxes)

Ensemble approach classification

Class proportion counts per survey were extracted from the classified EM3002D dataset (ensemble approach) and are shown in Fig. 9. Temporal trends’ and classes’ relationships are shown for the entire study area, as well as for the three selected gravel refugia. The fS class appears to be relatively stable across all instances and survey. An inversely correlated relationship is evident between cS + G and mS + b classes. This is also shown in Fig. 10 where the proportion counts per survey are plotted against each other.

Fig. 9
figure 9

Class proportions during each survey extracted from the classified dataset for three gravel refugia stations (ac) and the entire study area (d). For the refugia’s location the reader is referred to points D, E and F in Fig. 1

Fig. 10
figure 10

Linear regressions between proportions of cS + G and mS + b classes for the gravel refugia stations (ac) and the entire study area (d). Dotted lines 95% confidence limits. For the refugia’s location the reader is referred to points D, E and F in Fig. 1

At the level of the entire study area (Fig. 9d), and similarly to the pre-classification analysis, this method indicates that the class proportions return to their original state. On the contrary, within the gravel refugia zones (Fig. 9a–c), the cS + G class experiences a net loss in favour of finer substrate types with no indication of re-establishment to a previous state.

Post-classification

The bi-temporal transition matrix, cross-tabulating the relationships between thematic instances present within the classified maps of 2004 (T1) and 2014 (T4) is presented in Table 6. Persistence is denoted along the diagonal whereas off-diagonal entries are from-to transitions. Over 50% of the substrate remains static between the classifications. This is mainly driven by persistence of the mS + b and cS + G classes (with 27 and 20% persistence respectively). The class fS experienced the lowest persistence (7.6%) evidencing mostly the dynamics of the bedforms (see Fig. 11A where gains and losses result from the migration of dune crests).

Table 6 Raw Confusion matrix rounded to two decimals cross tabulating the classified instances in 2004 and 2014 thematic maps
Fig. 11
figure 11

Map representation of persistence, gains and losses for each class in the study area overlapping between T1 and T5

Following a more detailed inspection of the matrix, ratios describing class tendencies to persistence, gains and losses, swap and net change dynamics were computed (Table 7). Gains, losses and persistence changes are illustrated in Fig. 11 where their reciprocal relationships are observable; in particular between mS + b and cS + G classes in the North-Eastern part of the study area (see Fig. 11B where the mS + b class gains in favour of the cS + G class, forming ripple marks). All classes experienced a net gain in quantity between the 2 years except for the cS + G class which experienced a net loss of 7.5% [see Fig. 11C where it is visible that within a selected refugium, the loss is depicted, partly due to bedform migration (SW) and partly due to the appearance of the mS + b class within the flat and gravelly portion of this area (NE)]. Subtracting the net change from the total change derives swap dynamics. Of the total change for all classes, 83% results as swap; losses in a substrate class are replaced by gains in another substrate class. The mS + b class experienced the highest gain (21.41%), as well as the greatest loss (21.14%) implying that most of the change attributable to this class is due to swap in location. Proportionally, 99.3, 72.5 and 65.3% of changes are attributable to swap for mS + b, cS + G and fS classes respectively. The gain, loss and net changes are compared to the Persistence (diagonal elements of Table 6; calculated after Braimoh 2006) in order to derive ratios (respectively G p , L p and N p ) providing a measure of class tendency to types of transition. Values above 1 indicate that a class is more likely to gain or lose from other classes rather than persisting between classified instances. Values close to 0 indicate little or absence of change. The fS class has the highest G p value: it has a high tendency to gain. The mS + b class has similar G p and L p ratios, evidencing the high percentage of swap in this class. Most striking is the negative N p ratio and the high L p of the cS + G class.

Table 7 Summary of the changes between 2004 and 2014 (in percentage and expressed as ratios)

Discussion

Multibeam backscatter in a monitoring context

The basic premise in using MBES backscatter data for seafloor change detection is that changes in substrate cover must result in changes in backscatter values and changes in backscatter due to seafloor cover change must be large with respect to changes caused by other factors (readapted from Singh 1989) such as sea conditions, sensor’s intrinsic characteristics, changes in on-board acquisition parameters, vessel speed and direction of survey (Rattray et al. 2013; Lurton and Lamarche 2015). As such, verification of MBES backscatter stability is critical and should be controlled (Anderson et al. 2008). In this study, these limitations were mostly overcome as the dataset used was acquired by maintaining rigorous standards of acquisition and processing, including careful attention on environmentally dependent transmission losses (i.e. by regular control of absorption coefficient). To verify instrumental drift on the medium to long term, the trend in backscatter levels was compared against a time series in backscatter levels at a known reference area (KWGS reference area; Blue Polygon in Fig. 1; Roche et al. 2016). As such, average backscatter levels of the RV Belgica EM3002D could be compared from one survey to another during a similar period and allowed obtaining a dataset with temporally consistent dB ranges (yet relative).

However, changing environmental factors and seabed conditions may affect the backscatter values also. The effect of biological activity, which is seasonally driven and linked to the spawning and recruitment period of benthic species, is probably the most prominent factor. From literature, it is known that megabenthic zoo- and/or phytobenthic structuring species can be responsible for significant changes in the acoustic signal (e.g. Demosponges and Submerged Aquatic Vegetation, brittle stars; Montereale-Gavazzi et al. 2016; Holler et al. 2016), but also the occurrence of soft substrata macrobenthos ecosystem engineers such as tubeworms and some bivalves (e.g. Degraer et al. 2008; Van Lancker et al. 2012). Hitherto, the impact on the actual backscatter levels is poorly quantified and more research is needed on this aspect in a monitoring context. Beside changes due to the successional stages of some benthic species, natural variability in sediment deposition and erosion can also affect the backscatter level. This will depend on the hydrodynamics of an area, as well as on the sediment availability. Collection of tightly spaced acoustic surveys would be ideal to have a better control on the driving forces which would support the interpretation of trends in backscatter levels. In this study, the time lag between surveys was rather irregular which complicated distinguishing changes from natural versus anthropogenically-steered events. The combination of morphological analyses with backscatter change analyses is important in this regard.

Change detection methods

The pre-classification analysis of the backscatter time series indicated that within the selected regions of interest, no significant changes in seabed substrate could be detected across the timespan analysed. Since the first dataset was recorded with a former-generation echosounder, which was not cross-calibrated with the EM3002D and using a different frequency range, the values derived could not be directly compared in terms of the range in insonification values. The only evident behaviour in the data was in the barchanoid and gravel gully regions where locally, the mean backscatter level fluctuated around the 1 σ deviation (Fig. 8a, c).

Since the comparison is rather focused on the spatial delineation and areal extent of the substrate classes rather than the intrinsic, physical characteristics of a circumscribed area, the post-classification methods, as adopted similarly to Rattray et al. (2013), did allow comparing data from different echosounders, and acquisition parameters. This was also possible due to the agreement in the number and size of classes discriminated by the two echosounders. The approach revealed information on the behavioural tendencies of certain substrates to undergo a certain change such as the negative trend of the hard substrate class and gain of the finer substrates.

In this study, there was a high agreement between supervised and unsupervised models, using quantity and allocation agreement/disagreement metrics, which allowed extending the analysis to the entire time-series dataset. As such, the initially ground-validated information could be fully exploited and extended to the full backscatter dataset. Substrate class proportions over time could be extracted so that global changes could also be accounted for. This is unlike the pre-classification approach that is limited to selected sub-areas where backscatter levels were extracted from. Therefore, the ensemble approach combined supervised and unsupervised classification algorithms (similarly to Ierodiaconou et al. 2005) which allowed using one ground-truth dataset to train a classification that was subsequently applied to the whole time series. This is a big advantage since sampling of each time series is most often not realistic given survey time and cost restrictions. Here, application of consistent data acquisition and processing allowed the comparability of instant statistical analysis results at various times.

Accordingly to recent research (Li et al. 2016), it was shown that the Random Forest classifier is a highly valuable tool for seafloor applications, producing accurate models and providing information on the most important feature layers used in the classification. Similarly to Diesing et al. (2014), backscatter was by far the most important variable for seafloor substrate discrimination (with highest Boruta Z-score after 1000 runs). Depending on the method applied, the accuracy of the change detection is strictly dependent on the accuracy of the classified maps used in the assessment and on the stability of the repeated observations.

Application within a MSFD context

By classifying the data, it was shown that from before the start of dredging activities (T1), northwards of the study area, to just after the peak of marine aggregate extraction (T5), the gravel class progressively decreased at the level of the entire study area, including a net loss of the gravel class extent within the defined ecologically noteworthy areas (Fig. 1, black outlined polygons). From this, the ratio of hard versus soft substrata (Belgian MSFD indicator on seafloor integrity) first showed a negative trend, at least after the peak of the extraction activity, followed by a positive trend indicating a recovery process. Based on the depth time series, a morphological analysis revealed that part of the change is attributable to bedform migration of which the drivers require further investigation. An aggradational trend in the gravel areas was suggested from the observations, though this fell within the IHO confidence limits used. Despite this, changes in the depth profile depicted a reduction in seafloor complexity considering the surveys before and after the initiation of intense marine aggregate extraction.

A methodological framework to unambiguously link changes to pressures is under development and is yet hampered by a lack of data and knowledge on the natural variability and resilience of offshore sedimentary systems. Nevertheless, the present results are highly significant from an ecological perspective and necessitate a further investigation of the substrate evolution. If indeed smothering and/or deposition events would be more persistent under increased anthropogenic pressure, this may affect several ecosystem states and functions: e.g. reduction of sessile bio-encrusting epifauna; loss of surficial complexity leading to reduced micro-roughness; burial of biogenic clastic material; and overall reduced potential of bentho-pelagic coupling (Watling and Norse 1998; Hewitt et al. 2005).

Conclusion

This study highlights the importance of researching approaches and testing tools usable for local- and regional-scale environmental assessments (i.e. for MSFD implementation). A selection of useful methodologies was presented to detect changes in seafloor substrate types. The investigation showed how under specific standardised multibeam backscatter acquisition procedures, the confidence of repeated acoustic observations could be enhanced significantly and how the valuable, but expensive ground-truth information could be propagated from one survey to a time-series dataset via the application of supervised and unsupervised classification routines. The serial backscatter dataset was analysed using techniques developed in the remote sensing terrestrial realm showing that the methodologies are applicable for marine environmental monitoring. This is most promising for before and after control impact (BACI) type of assessments and such datasets would inevitably increase our understanding of anthropic impacts over an area. Although the methods presented were tested at local scales, they are repeatable and can be applied to broad-scale geographical extents; a major limitation being the need to collect large-scale datasets covering entire jurisdictional areas.