Multiple imputation of multibeam angular response data for high resolution full coverage seabed mapping

Acoustic data collected by multibeam echosounders (MBES) are increasingly used for high resolution seabed mapping. The relationships between substrate properties and the acoustic response of the seafloor depends on the acoustic angle of incidence and the operating frequency of the sonar, and these dependencies can be analysed for discrimination of benthic substrates or habitats. An outstanding challenge for angular MBES mapping at a high spatial resolution is discontinuity; acoustic data are seldom represented at a full range of incidence angles across an entire survey area, hindering continuous spatial mapping. Given quantifiable relationships between MBES data at various incidence angles and frequencies, we propose to use multiple imputation to achieve complete estimates of angular MBES data over full survey extents at a high spatial resolution for seabed mapping. The primary goals of this study are (i) to evaluate the effectiveness of multiple imputation for producing accurate estimates of angular backscatter intensity and substrate penetration information, and (ii) to evaluate the usefulness of imputed angular data for benthic habitat and substrate mapping at a high spatial resolution. Using a multi-frequency case study, acoustic soundings were first aggregated to homogenous seabed units at a high spatial resolution via image segmentation. The effectiveness and limitations of imputation were explored in this context by simulating various amounts of missing angular data, and results suggested that a substantial proportion of missing measurements (> 40%) could be imputed with little error using Multiple Imputation by Chained Equations (MICE). The usefulness of imputed angular data for seabed mapping was then evaluated empirically by using MICE to generate multiple stochastic versions of a dataset with missing angular measurements. The complete, imputed datasets were used to model the distribution of substrate properties observed from ground-truth samples using Random Forest and neural networks. Model results were pooled for continuous spatial prediction and estimates of confidence were derived to reflect uncertainty resulting from multiple imputations. In addition to enabling continuous spatial prediction, the high-resolution imputed angular models performed favourably compared to broader segmentations or non-angular data.


Introduction
Seafloor substrate properties are mapped at a high resolution to characterize the benthic environment for a wide range of applications. High-resolution substrate information is necessary to produce detailed maps of surficial geology (e.g., Todd et al. 2014;Stephens and Diesing 2015;Misiuk et al. 2018Mitchell et al. 2018) and is useful for estimating benthic species habitat suitability (McArthur et al. 2010). This information is often essential for local and regional marine management (Cogan et al. 2009), informing, for example, marine protected area (MPA) design (Howell et al. 2010;Ferrari et al. 2018) and recently, fisheries stock assessment and management (Smith et al. 2017).
Acoustic backscatter is one of the few direct surrogates available for seafloor substrate properties in waters too deep for spectral imaging (> ~ 20-30 m). Seafloor backscatter describes the echo intensity of an acoustic wave that has reflected off the bottom. The intensity reflected is partly a function of the material properties of the substrate, but also depends on internal and external substrate 7 Page 2 of 20 structure (e.g., surface roughness, sediment bed forms, sediment stratigraphy), and characteristics of the acoustic signal such as the frequency and angle of incidence (Lurton 2010). If the latter factors are properly constrained, backscatter can serve as a useful quantitative indicator of material seabed properties, and has been correlated to specific substrate parameters (e.g., Davis et al. 1996;Goff et al. 2000Goff et al. , 2004Collier and Brown 2005;Ferrini and Flood 2006;Sutherland et al. 2007;Haris et al. 2012). Recently, angle-varying gain (AVG) corrections have been widely employed to remove the angular dependence of the backscatter intensity from swath sonar systems, producing backscatter "mosaics" that can serve as a non-parametric predictor of seabed substrate and habitat type (Lurton and Lamarche 2015;Schimel et al. 2015). This provides the opportunity to treat backscatter data from multibeam echosounder (MBES) or sidescan sonar (SSS) systems as a raster layer, simplifying its use as a predictor in sediment and habitat modelling, while also enabling additional textural and image processing approaches (e.g., Lucieer and Lamarche 2011;Fakiris et al. 2019;Trzcinska et al. 2020).
Although the AVG backscatter mosaic has proven highly useful for mapping seabed substrate properties and benthic habitats, it requires a reduction of angle-dependent acoustic information that may be useful for discriminating seabed characteristics (Fonseca et al. 2009;Haris et al. 2012). Backscatter intensity co-depends on the angle of incidence and the properties of the substrate (e.g., roughness, grain size; Lamarche and Lurton 2018), and different MBES beam angles may provide complementary, non-redundant information describing substrate characteristics (Hughes Clarke et al. 1996). Scattering of inner acoustic beams, for example, is dominated by specular reflection, which is largely a function of seafloor hardness (Weber and Lurton 2015). At the outer beams, scattering is increasingly sensitive to the interface roughness. Some mapping approaches advocate retaining this rich acoustic information to better discriminate substrate properties. Angular range analysis (ARA) is an approach for calculating the full angular response curve (ARC) across each half of the MBES swath (i.e., from nadir to the outer beam on both the port and starboard sides of the swath), which can be used to estimate substrate properties, for example, by deriving parameters that are used to invert an acoustic backscatter model using calibrated backscatter data (Fonseca and Mayer 2007). The use of data spanning half the swath width results in low horizontal resolution for substrate predictions that are estimated using angular approaches. Fonseca et al. (2009) therefore proposed to aggregate soundings by homogenous patches of seabed identified from the backscatter mosaic rather than by the swath width, increasing the spatial resolution of the angular analysis. Che Hasan et al. (2012Hasan et al. ( , 2014 applied this technique automatically by first segmenting the backscatter mosaic into "image objects", then aggregating soundings according to the segment boundaries. Alternative approaches have sought spatially continuous solutions for retaining the resolution of the AVG backscatter mosaic while also achieving an estimate of the ARC. Parnum (2007) proposed to derive an "angular cube"-comparable to the hyperspectral cube in terrestrial remote sensing-by interpolating continuous surfaces from discrete incidence angles, thereby producing angular response estimates for each raster cell of a gridded dataset. Huang et al. (2014) suggested this could also be achieved by producing multiple backscatter mosaics using different reference angles for the AVG correction. These two approaches were compared by , who found they produced similar results, yet noted the latter may be more flexible regarding the dataset overlap and sounding density (e.g., in deeper waters). Simons and Snellen (2009) have presented yet another solution, wherein the backscatter response of each beam is classified independently in an unsupervised Bayesian framework, precluding the need for angular compensation. If the survey overlap is sufficient, this approach can produce near-continuous maps (Alevizos et al. 2015)otherwise, results can be interpolated (Gaida et al. 2019).
The advent of multi-frequency MBES presents new opportunities for seabed discrimination. Multi-frequency MBES vary the operating frequency on a "ping-by-ping" basis, cycling through a pre-selected range of frequencies (e.g., 100, 200, 400 kHz; Brown et al. 2019). In addition to multiple promising approaches for utilizing the backscatter response of these individual frequencies for substrate discrimination (e.g., Buscombe and Grams 2018;Costa 2019;Gaida et al. 2019), Gaida et al. (2020) demonstrated the potential for measuring differences in substrate penetration from the depth soundings of different frequencies. They recommended analysing the multi-frequency backscatter intensity alongside the difference in depth measurements to obtain a robust estimate of both the surficial sediment composition and water depth. Additionally, they provide evidence that differences in substrate penetration between frequencies vary substantially with both incidence angle and substrate composition.
The interrelationships between backscatter intensity, incidence angle, substrate penetration, and operating frequency for multi-frequency MBES are complex, yet we believe this complexity could be utilized for addressing outstanding challenges regarding the application of high-resolution angle-dependent backscatter analysis. All angular response analyses must contend with the discontinuous nature of angular data. Where soundings are sufficiently dense, this issue is readily mitigated, for example, by simply gridding the soundings at an appropriate cell size . In fact, ensuring a sufficiently dense survey so that all angles of incidence are represented at the desired spatial resolution is the most straightforward and robust solution to the issue of angular data discontinuity. As the depth increases though, or where the survey overlap of pre-existing data is insufficient, such solutions become infeasible, and alternative methods such as object-based aggregation (Che Hasan et al. 2012, interpolation (Parnum 2007;, or multiple angular compensations (Huang et al. 2014; are required. Furthermore, multi-frequency MBES may also suffer reduced sounding density in the along-track direction, per frequency, as the system must cycle through the frequencies sequentially. Multicollinearity is also a characteristic of MBES angular response data that is increased in multi-frequency systems. MBES backscatter intensity at any given angle of incidence is expected to be correlated with intensities at other, similar angles, which is one reason why AVG compensation is successful. Though different operating frequencies will produce unique angular response curves (Weber and Lurton 2015), some amount of correlation is still expected between them, depending on the substrate-effectively increasing the dimensionality of multicollinearity. Multicollinearity is, of course, problematic in some modelling contexts, but can also be highly valuable for handling missing data. Data imputation techniques enable the leveraging of collinearity between variables in order to facilitate statistical modelling using incomplete observations (i.e., data points with missing values for some variables).
Data imputation refers to a suite of methods for completing datasets with missing values in order to avoid the listwise deletion of partially complete observations in downstream analyses (Rubin 1987;van Buuren 2018). These techniques are now well-developed and have been applied to missing data problems in public censes (Rubin 1987), psychology (van Ginkel et al. 2010, medicine and epidemiology (e.g., Ambler et al. 2007;Vergouw et al. 2012;Eekhout et al. 2012), and biology (e.g., Troyanskaya et al. 2001;Penone et al. 2014). The general approach of most data imputation techniques is to estimate missing observations of variables using relationships with other variables where observations are not missing. Imputation of univariate missing data is readily accomplished, for example, by modelling the missing values according to the complete observations, or by drawing from observed values. Multivariate imputation-where missing data are imputed for multiple variables simultaneously-is non-trivial, yet can be accomplished using Fully Conditional Specification (FCS; van Buuren et al. 2006), wherein missing data from each variable are modelled iteratively and sequentially until the scope of the imputation is achieved. Furthermore, "multiple imputation" describes methods in which the missing data are imputed multiple times to obtain "proper" estimates of the uncertainty of the missing data (Rubin 1987(Rubin , 2004van Buuren 2018). Unlike many modelling applications, imputation stands to benefit from multicollinearity within the dataset, which can be leveraged to estimate missing values where no data occur.
Given the multidimensional multicollinearity of multifrequency MBES datasets, and the preponderance of missing data at any given angle of incidence at the seafloor, we propose to evaluate the use of multiple imputation to estimate missing angular data at a high spatial resolution. This is motivated primarily by (i) the common need for spatially continuous map predictions that require complete observations of all explanatory variables over the study extent, and (ii) a desire to avoid omitting ground truth data points during statistical modelling at locations where all explanatory angular variables are not represented (i.e., the listwise deletion of partially complete observations; van Buuren 2018). Additionally, we propose to use the difference between measured depths from different acoustic frequencies at particular incidence angles as a proxy for substrate penetration to inform the imputation procedure and increase capacity to map sediment properties (Gaida et al. 2020). To achieve full spatial continuity for angular intensity and depth difference data, aggregation is still required at some level to produce map units that accommodate observations at multiple angles of incidence simultaneously. We investigate the methods proposed by Che Hasan et al. (2012Hasan et al. ( , 2014, wherein discontinuous angular measurements are aggregated according to object-based image segments expected to represent "acoustic themes" (i.e., homogenous substrate units; Fonseca et al. 2009). This method facilitates the co-location of angular response data and can help to initially minimize the amount of missing information per observation. We hypothesize an optimal segmentation scale that is fine enough to delimit substrate boundaries at the scale of interest, but general enough to allow for the simultaneous observation at a range of incidence angles. The goals of this paper are: (i) to evaluate the effectiveness of multiple imputation for leveraging inter-angular and inter-frequency multicollinearity of MBES data to produce realistic estimates of missing angular acoustic values; (ii) to evaluate the usefulness of imputed angular response and depth difference data for benthic habitat and substrate mapping at a high spatial resolution.

Multi-frequency MBES data
The Bedford Basin 2017 MBES dataset was selected to test the utility of imputation methods for multi-frequency angular mapping. This dataset is well-established following its use in the 2017 R2Sonic Multispectral Challenge 7 Page 4 of 20 (e.g., Buscombe and Grams 2018;Gaida et al. 2018;Costa 2019) and is useful as a benchmark for methodological investigations. Brown et al. (2019) provide a detailed description of the data acquisition. Briefly, the survey was conducted on 2 May 2017 to map a heterogeneous patch of seabed that has been studied previously at the Bedford Basin, Halifax, Canada (Fader and Miller 2008). A pole mounted R2Sonic 2026 MBES with a Valeport sound velocity probe and POS MV Wave Master Inertial Navigation System with two Trimble GPS antennas was deployed port-side of a 12 m vessel, and soundings were conducted at 100, 200, and 400 kHz operating frequencies. Sound velocity profiles (SVP) were obtained during the survey using an AML Base X2, and all data were integrated during acquisition using QPS QINSy. The full data processing workflow for this project is summarized in Fig. 1. The QPS suite was used to reprocess the multi-frequency MBES data for the purposes of this study. The raw data were loaded into Qimera with corresponding tidal and SVP measurements acquired during the survey. Tidal and SVP corrections were applied to the data, followed by filters to isolate soundings from each MBES frequency in turn (100, 200, 400 kHz). For each set of soundings, spline filters were used to reject erroneous data, followed by minimal manual cleaning of residual artefacts. Cleaned and corrected soundings were exported as.GSF files along with gridded bathymetric surfaces, yielding three single-frequency datasets.
The Fledermaus Geocoder Toolbox (FMGT) was used to apply standard radiometric corrections to the raw MBES backscatter intensity. Cleaned soundings for each of the frequencies were imported into FMGT via the.GSF files output by Qimera, and the gridded bathymetric layers were added as reference grids for slope correction. The R2Sonic 2026 sonar defaults were retained within the software, and absorption coefficients were estimated for each frequency using calculations provided by the National Physical Laboratory (2018), as recommended in the current FMGT software documentation. Default settings were used for all other geometric and radiometric corrections. The corrected soundings without AVG (i.e., level BL 3 ; Schimel et al. 2018;Malik et al. 2019) were exported as ASCII files, and AVG-compensated mosaics (i.e., BL 4 ) were also generated using the "flat" algorithm, referencing the mean of the angular interval between 30-60° with a moving window of 300 pings. The echosounder was not calibrated prior to the survey; all backscatter values output from processing were on a relative dB scale.

Backscatter data segmentation and aggregation
Angular acoustic measurements were derived over the extent of the MBES coverage. The ASCII files output from FMGT were parsed by the angle of incidence at the seafloor with a custom R function to achieve a point representation of each sounding with depth and backscatter intensity attributes for each frequency. The soundings were aggregated to 3° bins to acquire the backscatter intensity variables BS f ( ) for all frequencies f = 100, 200, 400 , and angle bins = [10, 12], [13,15], … , [55,57] . The difference in measured depth, Δd f ,f � ( ) , was calculated between each pair of frequencies f and f ′ for each angle bin by aggregating the depth soundings to a 2-m grid and subtracting co-located soundings of disparate frequency but like angle. These variables have a high angular (3°) and spatial (2 m grid) resolution, but poor continuity across the dataset resulting from insufficient angular coverage. In order to co-locate the angular frequency-dependent variables that do not occur in the same grid cell, and to increase their spatial coverage, the acoustic data were aggregated to seabed segments using the approach suggested by Che Hasan et al. (2012;modified from Fonseca et al. 2009). Seabed segments were achieved by segmenting a three-band RGB raster containing AVG-compensated backscatter mosaics for each frequency using the mean shift algorithm (Comaniciu and Meer 2002) in ArcGIS Pro. Behaviour of the mean shift in ArcGIS Pro is controlled by manipulating parameters that balance the "spatial detail" (segmentation weighting based on proximity), "spectral detail" (weighting based on attribute values), and minimum or maximum segment size, which are used to merge or split segments at a given number of raster cells.
Here, spatial and spectral detail were set to their maximum values and the minimum segment size was used to manipulate the segmentation outcome. The product was a continuous set of segmented units over the study extent (termed "acoustic segments"). The angular frequencydependent data were then averaged at each acoustic segment (Fig. 2). The average backscatter intensity of frequency f and angle bin for a given segment was obtained by first averaging the values of all soundings i = 1, 2, … , n within a given raster cell, then averaging the values of all cells j = 1, 2, … , m within an acoustic segment: Similarly, the average depth difference between frequencies f and f ′ at angle was calculated for a given segment by first averaging the depth soundings over each raster cell where data exists, then subtracting depth values of differing frequency at co-located cells. The results were averaged over the acoustic segment: Given the three frequencies, 16 angle bins, and two measurements (backscatter and depth difference), this yielded 96 attributes for each acoustic segment of the dataset. (1) Page 6 of 20 Fig. 2 a Swath multibeam soundings were binned according to the angle of incidence and overlain on a segmented raster grid (segment boundaries in black). b The backscatter and depth difference of binned angular soundings were aggregated per 2-m raster cell (example cell in red) for three operating frequencies. c Cell values were then aggregated according to acoustic segments (example in red) to achieve estimates over the full range of angles for a homogenous patch of seabed. Missing data may occur where a segment does not contain data at all incidence angles To explore the trade-off between spatial resolution and data continuity for benthic mapping at the Bedford Basin, two segmentations were generated: (i) the three-band backscatter mosaic was segmented at a scale broad enough (minimum mean shift segment size 7000 m 2 ) to yield segments with no missing angular data (i.e., all segments included soundings at all frequencies and incidence angles; Fig. 3b); (ii) segmentation was performed at the finest scale necessary to capture the apparent local heterogeneity in substrate patches visible in the multi-band backscatter mosaic, allowing missing data to occur (minimum mean shift segment size 1000 m 2 ; Fig. 3c, d).
Because the former segmentation generated acoustic segments of considerably lower detail than the latter, it is necessary to evaluate the trade-off between dataset continuity and spatial resolution. For comparison, the AVG-compensated backscatter mosaics for each frequency (hereafter BL 4,100 , BL 4,200 , BL 4,400 ) and average depth (between all frequencies; D ) were also extracted for each segment.

Imputation
Missing angular data resulting from the fine scale segmentation were completed using multiple imputation. "Multiple imputation" encompasses the entire workflow for achieving statistical inference using an incomplete dataset-from the estimation of missing values to statistical model fitting. The full multiple imputation workflow can be divided into three parts (Rubin 1987; van Buuren 2007): (i) Multiple versions of the complete dataset are generated by predicting the missing values using statistical relationships with other correlated variables in the dataset. The non-missing data are identical between versions of the full dataset, but the imputed values differ as a function of the imputation models, which incorporates variability in the prediction of missing values (van Buuren 2018). (ii) Each version of the complete imputed dataset is used to fit the statistical model of interest, resulting in multiple independent analyses and models, which differ according to variability among the imputed data. (iii) The models are pooled to estimate the parameter(s) of interest, including estimates of uncertainty arising from multiple versions of the imputed dataset. Uncertainty in the modelled parameters is a function of uncertainty regarding the missing values.
The motivation for completing multiple versions of the dataset and analysis is that the single imputation of a value that was missing underestimates uncertainty regarding what the missing value should be. Rubin (1978) noted that, "imputing one value for a missing datum cannot be correct in general, because we don't know what value to impute with certainty (if we did, it wouldn't be missing)". A number of methods exist to introduce proper amounts of variability in the imputed data.
An effective approach to generating multiple multivariate imputations (step i) above) is Fully Conditional Specification (FCS; van Buuren et al. 2006). FCS proceeds iteratively by specifying an imputation model independently for each variable in the dataset with missing values, using the other covariates as predictors. There are many potential imputation models, but generally, these should produce imputed values with an appropriate amount of variability given that which is observed for each variable in the dataset. Multiple Imputation by Chained Equations (MICE) is a Markov Chain Monte Carlo method for FCS-specifically, a Gibbs sampler (van Buuren and Groothuis-Oudshoorn 2011; van Buuren 2018). MICE works by first completing missing data for each of q predictors in the dataset (here, the 96 acoustic attributes at each acoustic segment) using random draws from the observed values as "placeholders" (Azur et al. 2011). Starting with the first of the q variables, 1 of q , the "placeholder" values are removed, and a model is fit between the remaining observed values for 1 and the other predictors in the dataset, 2 , … , q . The missing values are then predicted for 1 using the model, which also incorporates uncertainty in the prediction using one of several means (e.g., random draws from the data, bootstrapping, Bayesian regression). The variable 1 has now been updated and the algorithm proceeds with 2 , using the other synthetically complete variables in the dataset, including those resulting from prediction in the previous steps. The entire process of cycling through all q variables is repeated for a set number of iterations for the algorithm to converge (often ~ 10), and the final version of the dataset is retained as one of multiple imputations. Additional details on the specifics of these steps are provided by Azur et al. (2011), van Buuren and Groothuis-Oudshoorn (2011), and van Buuren (2018).
Here, several imputation models were trialed for the prediction step, yet the implementation of Random Forest multiple imputation in the R package 'mice' (van Buuren and Groothuis-Oudshoorn 2011) generally produced imputations that converged quicker, with lower error, than other methods such as predictive mean matching (PMM), Bayesian linear regression, and linear regression using bootstrapping. Random Forest was therefore selected for all imputations presented hereafter. The ability of Random Forest to automatically model variable interactions has been highlighted as one of its strengths for imputation (Doove et al. 2014). This is expected to be useful in the present case of highly dimensional multi-frequency MBES data. This form of imputation proceeds not by simply predicting missing values using Random Forest, but by using the trees that comprise a Random Forest model to identify "donor" values in a manner similar to PMM (van Buuren 2018). Treating the variable being imputed as the response and the other covariates as predictors, multiple decision trees are grown on bootstrap samples of the dataset according to established Random Forest methods (Breiman 2001). Imputed values are generated by determining the terminal leaf of each decision tree to which the missing values belong according to the tree splits, then randomly selecting from among the observed values at those leaves, which comprise the potential "donors" (Doove et al. 2014). This has the effect of injecting multiple elements of stochasticity into the imputations, while retaining the exclusive selection of "real" values that are drawn from the dataset. Conceptually, this is a non-parametric and non-linear method for generating candidate donor values rather than a predictive model, and relatively few trees are required for each imputation (e.g., 10). Doove et al. (2014) provide a detailed description of the algorithm.

Imputation simulation
To inform on the circumstances under which the methods presented here may be tenable, simulation was used to explore how accurately the angular frequency-dependent backscatter ( BS f ( ) ) and depth difference ( Δd f ,f � ( ) ) measurements can be imputed given variable amounts of missing data. First, all segments resulting from the mean shift algorithm with complete observations at all incidence angles using each frequency ( n = 564 ) were isolated to produce a complete dataset (i.e., with no missing data). Missing data were synthetically generated for this dataset by randomly dropping observations within the segments to achieve versions of the dataset with between 10 and 90% of data missing. To realistically simulate the conditions under which missing data may occur using a swath sonar system, the missing data generator algorithm proceeded as follows: (i) select an acoustic segment at random from the complete dataset; (ii) for that segment, randomly select one of t he available incidence angle bins ( = [10, 12], [13,15], … , [55, 57]); (iii) drop all observations of variables that were measured at the angle bin selected in (ii).
Given the two variables measured at three operating frequencies, the above produces six missing values per iteration. At the end of each iteration, the cumulative proportion of missing data is calculated and the algorithm proceeds until the desired amount of missing data is achieved. Because multiple variables are measured at a given angledependent sounding, all variables at a given angle are likely to be either present or missing for each observation, and this method of missing data generation is therefore expected to be more realistic than a random draw.
For each simulated incomplete dataset (10-90% missing data) the missing data were imputed and compared to the actual measured values that were dropped. Ten Random Forest imputations were conducted for each simulation using the default Random Forest parameters in the 'mice' package for ten iterations. The mean absolute error (MAE) and variance explained (VE) between multiple imputed values and the real values were calculated to determine the absolute and relative average accuracies of the imputations given increasing amounts of missing data.

Full dataset imputation
Following simulation, the missing angular measurements of all fine scale segments ( n = 1087 ) were imputed to produce ten plausible hypotheses of the full dataset. Imputation of the full dataset was performed using the same imputation design as the simulations in "Imputation simulation" section (ten imputations using Random Forest for ten iterations). Using the 'mice' package, it is possible to specify a unique set of predictors for each variable undergoing imputation. Random Forest is expected to be robust to uninformative predictors (i.e., they are not selected for tree splitting), yet predictors of a given variable were additionally omitted if the absolute Pearson correlation between the variable and predictor was below 0.1, or if the predictor was present in less than 10% of observations of the variable being imputed. All angular variables at all frequencies were included in the imputation procedure, along with the average water depth measured by all frequencies ( D ). The imputation algorithm was run for ten iterations, after which convergence was checked by observing the mean and standard deviation of imputed values plotted against the iteration number, following van Buuren and Groothuis-Oudshoorn (2011). The result was ten versions of the imputed dataset with complete observations of variables using each operating frequency at each incidence angle bin for both backscatter intensity ( BS f ( ) ), and depth difference ( Δd f ,f � ( )).

Substrate modelling
The utility of the imputed data for habitat mapping applications was evaluated using empirical modelling. Georeferenced ground truth photographs and video observations 7 Page 10 of 20 Fig. 4 a 400 kHz backscatter intensity for each segment measured at the 37° incidence angle bin ( BS 400 (37)) with observed substrate classes from seafloor images and video, and b depth difference between 100 and 400 kHz soundings for each segment measured at the 37° incidence angle bin ( Δd 100,400 (37)) with grab samples. Segments with missing data for the 37° angle bin are shown as hatched areas and were imputed to produce full coverage observations for c BS 400 (37) and d Δd 100,400 (37) collected in 2017 and 2018, described by Brown et al. (2019), were examined in the context of the angular frequency-dependent data (Fig. 4). Where a segment had multiple ground truth observations, the most common class was assigned (producing n = 171 ground truth segments). Random Forest models predicting the bottom class using the angular frequency-dependent predictors were trained using the 'randomForest' package (Liaw and Wiener 2002) in the Microsoft R Open version of R (R Core Team 2019). Models were built using 500 trees, default hyperparameters, and no variable reduction. For comparison, several models were generated using various configurations of the acoustic predictors and segmented backscatter layers: (i) fine scale segmentation with AVG compensated frequency-dependent backscatter values and depth; (ii) fine scale segmentation with angular frequency-dependent predictors and depth (with missing data); (iii) broad scale segmentation with angular frequency-dependent predictors and depth; and (iv) fine scale segmentation with imputed angular frequencydependent predictors and depth (Table 1).
Model predictive performance and uncertainty were evaluated to explore the effects of multiple imputation on classification results. Predictive performance was quantified using the classification accuracy and kappa values calculated from the out-of-bag samples. The votes from the Random Forest represent a discrete event X for each acoustic segment, with three possible outcomes, {x 1 , x 2 , x 3 } that represent the ground truth classes. The predictive uncertainty from the ten different models can therefore be represented succinctly for each acoustic segment using their entropy, where P(x i ) is the proportion of votes (i.e., the probability) for each class. Lower values of H represent increasingly unanimous agreement among models and high values indicates low agreement. Entropy values were mapped along with the predicted classes for each acoustic segment to convey predictive uncertainty spatially. The relationship between the proportion of missing data, which were imputed, and the entropy at each segment was also investigated. To control for heterogeneity among the substrate classes, entropy was modelled as a function of the proportion of missing data and also the predicted class using a generalized linear model (GLM). Entropy is a positive and continuous variable, but here contained a preponderance of zero values where model agreement was unanimous, which occurred exclusively for the "fine" sediment class. A hurdle approach was used to partition the model into two parts: (i) the probability that entropy is not zero, and ii) the entropy value, conditional on it not being zero (Cragg 1971;Potts and Elith 2006). The first model was a binomial GLM with a log link function to predict the probability of non-zero entropy as a function of the proportion of missing data and the predicted substrate class. The second was a GLM with a gamma error distribution and identity link to predict nonzero entropy values using missing data proportion and substrate class. Interaction was tested between all predictors.
The capacity for modelling continuous substrate properties was also evaluated. Van Veen grabs obtained from the site in 2018 ( n = 19 ; Brown et al. 2019) were analyzed for sediment grain size, providing particle size distributions. The arithmetic mean grain size ( x a ) and sorting ( a ) in μm were calculated for each sample using GRADISTAT (Blott and Pye 2001) and were assigned to their overlapping acoustic segments (Fig. 4). Initial trials suggested that neural networks outperformed Random Forest at modelling these continuous substrate properties. Neural networks predicting x a and a using each configuration of the acoustic data (Table 1) were trained using Keras TensorFlow in Python 3.7. Both models comprised two dense layers of 128 and 64 units using the rectified linear unit (ReLU) activation function, followed by a single-unit dense output layer with linear activation. Dropout was implemented between all dense layers at a rate of 0.3. The models were optimized using the Adam algorithm (Kingma and Ba 2017) with a mean squared error (MSE) loss function and were trained for 600 epochs using the full batch size. Leave-one-out cross validation (LOO CV) was used to obtain estimates of model performance given the small sample size. Pearson's correlation ( r ), variance explained (VE), and mean absolute percent error (MAPE) were calculated between the predicted and omitted test values, and were compared between all configurations of the acoustic data (Table 1).
The effects of missing data on the uncertainty of grain size parameter predictions were explored using GLMs. Prediction variability, as measured by the standard deviation, was modelled against missing data proportion for each parameter ( x a and a ). To control for apparent increases in uncertainty at higher predicted values, the 7 Page 12 of 20 predicted parameters at each segment were also included in the model. The standard deviation of predictions (the response) was continuous and greater than zero, yet nonlinear relationships were observed with the predictors. Initial models also suggested heteroskedasticity of residuals. GLMs were fit with a gamma error distribution and a log link function, and interaction was tested between all predictors.

Fig. 5
Mean absolute error a and b and variance explained c and d ± 1 SD between simulated missing data and imputed data for angular-and frequency-dependent depth differences (left) and backscatter intensity (right). Lines are fitted exponential curves

Imputation
Exploratory analysis suggested strong multicollinearity among angular frequency-dependent variables. In short, backscatter measurements were strongly correlated across incidence angles and frequencies. Backscatter measurements were correlated to the depth differences between frequencies at non-oblique angles (e.g., < 46°), but were generally uncorrelated with these measurements at increasingly oblique angles. Depth differences were also intercorrelated, with lower correlations at increasingly oblique angles. Detailed analysis of the correlation matrix is provided in Online Resource S1.

Imputation simulation
Given the observed correlations between angular frequencydependent variables, missing data were simulated to determine what proportions of missing data may be reasonable to impute for a multi-frequency MBES dataset. Results suggested that the error of depth difference ( Δd f ,f � ( ) ) imputation tends to accelerate past > 50% missing data (Fig. 5). The low MAE of Δd 200,400 ( ) imputations, even at high levels of missing data, contrasts with the drop in VE, suggesting the Δd 200,400 ( ) variables may have comparatively low amounts of variance. The error of angular backscatter ( BS f ( ) ) imputation accelerates at levels > 60% missing data, which appears as an elbow in plots of error against the proportion of missing data (Fig. 5). These are likely conservative estimates given the reduced simulation sample size using complete cases ( n = 564 ) compared to the full dataset ( n = 1087).

Full dataset imputation
The proportion of missing data resulting from the fine scale segmentation ("Backscatter data segmentation and aggregation" section) was 14.05%. Results from "Imputation simulation" section suggest this is a reasonable proportion of missing data to impute for both sets of angular frequencydependent variables ( Δd f ,f � ( ) ; BS f ( ) ). Multiple Imputation by Chained Equations was performed for ten iterations to produce ten versions of the complete dataset.

Substrate modelling
Random Forest classification of the bottom type observed in ground truth imagery produced different results depending on the configuration of the acoustic predictor data. The fine scale segmentations using angular data (missing and imputed) produced the most accurate classification results (Table 2)-the differences between these two models are that (i) imputation makes available more ground truth samples for model training ( n = 171 vs. n = 117 acoustic segments), and (ii) model predictions using imputation span the full extent of the study area, while those without imputation leave missing data areas unclassified. The segmentation that was sufficiently broad to yield no missing angular data was too coarse to capture the full heterogeneity of bottom types and produced the least accurate model. The compensated acoustic data with a fine segmentation (and no missing data) was less successful than the fine scale angular methods at predicting rarer classes, as indicated by the kappa score.
The hurdle model provided insight into the effects that the missing data, which were imputed, had on the predictive uncertainty (i.e., entropy). The fitted binomial model was: where p(H > 0) is the probability of non-zero entropy, and miss is a predictor representing the proportion of missing (4) ln p(H > 0) 1 − p(H > 0) = 0 + miss miss + , data. The model suggested that the proportion of missing data was a significant predictor of non-zero entropy ( P < 0.001 ; Fig. 6), but we note, again, that the zero values occurred exclusively in the "fine" class (precluding the inclusion of predicted class in the model). Back-transforming predicted values from the logistic model predicts a ~ 15% increase in the probability of non-zero entropy between 0 and 100% missing data. The fitted gamma model for non-zero entropy was: where mixed and coarse are levels of the predicted categorical substrate class (with the "fine" class acting as reference). Results from this model suggested that non-zero entropy was highly dependent on the predicted substrate class ( P < 0.001 ), but that the proportion of missing data had no significant effect ( P = 0.417 ). Interaction terms between the proportion of missing data and predicted substrate class were non-significant and were dropped. Additional model analysis is provided in Online Resource S2. Prediction of continuous substrate properties was generally most successful using the angular frequency-dependent predictors. These outperformed the compensated backscatter predictors in all cases, even when using broader acoustic segments (Table 3). The broader segmentation resulted in the aggregation of four grab samples at two of the acoustic segments, reducing the sample size to n = 15 . Predictions using the angular variables at a fine segment scale with missing data were highly accurate, particularly for the mean grain size predictions, yet the occurrence of missing predictors reduced the sample size by more than half ( n = 7 ), bringing into question the representativeness of these results. The angular imputed datasets performed well on average with the lowest error in all cases (according to the MAPE), and the highest correlation and VE scores amongst the sorting models.
GLMs provided insight into the effects of missing data on the uncertainty for the two grain size parameters. For (5) ln (H) = 0 + miss miss + mixed mixed + coarse coarse + ,  both predictions, increased uncertainty was apparent in the along-track direction of the survey (NW-SE), corresponding to acoustic segments with high amounts of missing angular data (Fig. 3), which can be conveyed spatially along with the pooled model predictions (Fig. 7). The gamma GLMs fitted to investigate the effects of missing data on the mean grain size and sorting prediction variability were: and The coefficient for missing data, miss , suggested an increase in the standard deviation of predictions by a factor of 3.50 per unit for mean grain size, which is a factor of ~ 1.13 per 10% missing data. Similarly, miss for sorting suggested an increase in the standard deviation of predictions by a factor of 6.19 per unit, or ~ 1.20 per 10% missing data. Areas of finer predicted mean grain size appeared to have lower uncertainty than coarse areas (e.g., x a > 120 μm). The fitted coefficient for the predicted mean grain size ( x a ) suggested that the prediction variability increased slightly but significantly with the predicted value of x a -by a factor of less than 1.01 per μm, or ~1.07 per 10 μm (Table 4). The coefficient for sorting ( ) predicted a very slight increase in variability at higher sorting values, by a factor of ~1.02 per 10 μm. All interaction terms were non-significant and were dropped from the models. Additional details on the GLMs are provided in Online Resource S2.

Discussion
Novel acoustic technologies obtain increasingly detailed information on the seabed, and complementary analytical approaches facilitate the use of these data for seabed mapping purposes. The recent introduction of multi-frequency (6) ln s x a = 0 + x a x a + miss miss + (7) ln s a = 0 + a a + miss miss + .
MBES provides opportunities to overcome dependencies imposed by the use of a single acoustic frequency, and additionally, to investigate new metrics related to the relationships between frequencies (e.g., differences in substrate penetration; Gaida et al. 2020). Empirical approaches on the use of multi-frequency angular response information for seabed mapping are still sparse, and general research on acoustic angular response and its application for seabed mapping is ongoing (e.g., Wendelboe 2018;Fakiris et al. 2019;Fezzani et al. 2021;Fonseca et al. 2021). Ways in which discrete ground truth samples can be characterized using angular data covering the full swath width, and thus the full study area, are of particular interest. The methods presented here propose to leverage the collinearity between angular measurements and multiple operating frequencies to impute well-informed estimates of spatially continuous angular data across the extent of the study area. This enables modelling and prediction of substrate parameters from discrete ground truth samples at a high spatial and acoustic resolution. The uncertainty associated with predictions using imputed data are conveyed spatially to aid in the interpretation of results. We note that this procedure also allows for estimation of the full angular response curve at all acoustic segments across the dataset (Online Resource S3). Imputation simulations in "Imputation simulation" section suggested that the proportion of missing data resulting from fine scale acoustic segmentation in this study (14.05%) did not approach the upper limits of what may be reasonable to impute. Even with 40% simulated missing data, imputed values explained, on average, > 90 and 75% of observed data variance for backscatter and depth difference variables, respectively (Fig. 5). These results are encouraging, particular for angular backscatter imputation, and are enabled by the consistent collinearity of collocated measurements (Online Resource S1). We suggest that the success of depth difference variable imputation will depend strongly on the substrate properties, as will the usefulness of such variables for habitat and substrate mapping (Gaida et al. 2020). These variables may have little value for predicting coarse or hard surficial sediments that preclude measurable differences in substrate penetration.
The performance of models used to predict seabed substrates varied here depending on the scale and format of acoustic predictors. Regardless of whether the acoustic data were angular or AVG-compensated, Random Forest classification using data segmented at a fine spatial scale performed favourably compared to a broader segmentation. Recall that the fine segmentation scale was selected to capture the apparent substrate heterogeneity observed from the backscatter data, while the broad scale was selected to produce acoustic segments with no missing data. All neural network grain size parameter models using angular data outperformed those using AVG data at a fine spatial scale. We note the apparent correspondence between the thematic resolution of response and predictors in these cases (i.e., the measurement detail). The sediment types identified from benthic images at a broad thematic resolution ("fine", "mixed", "coarse" classes), were predicted well enough using acoustic data of low angular resolution (i.e., AVGcompensated; Table 2). Grain size parameters from physical samples (e.g., x a and a ), however, are measured at a high resolution, and were modelled more effectively here using angular predictors (Table 3). The acoustic response of the seabed may be sensitive to these grain size parameters at varying incidence angles-for example, as acoustic scattering is increasingly influenced by interface roughness and less by hardness at oblique angles (Weber and Lurton 2015). This information is lost when performing AVG compensation. In all cases explored here, the fine scale segmentation with angular predictors outperformed the AVG models, and imputation enabled full coverage maps and increased sample size, without the loss of predictive accuracy. It is important to note that these methods are also applicable to single frequency MBES datasets. Although imputation had no negative impact on model performance, it affected the uncertainty of model predictions. Multiple imputation (i.e., the creation and modelling of multiple entire imputed datasets) necessarily implies that we cannot be certain of the imputed values. Retaining and quantifying this uncertainty is an important part of the imputation procedure (Rubin 1978;van Buuren 2018); predictions based on data that are partially missing should be uncertain. Recent emphasis on conveying spatial model uncertainty can also be found in the seabed mapping literature (e.g., Mitchell et al. 2018;Shields et al. 2020;Strong 2020;Diesing et al. 2021). The results of multiple imputation can be incorporated into broader uncertainty analyses and mapped spatially to facilitate interpretation of model results and confidence. The classification results here, for example, incorporate the uncertainty that can be gleaned from an individual Random Forest model with the uncertainty arising from imputation by tallying votes from the ten individual models (Fig. 6). The map of entropy ( H ; Fig. 6) portrays predictions of the "mixed" class as uncertain compared to the other classes. Statistical analyses support this observation, suggesting that the seabed class was a significant predictor of model uncertainty, while the proportion of missing data had comparatively little impact on uncertainty ( Fig. 6; Online Resource S2). The confidence of neural network grain size parameter predictions, on the other hand, appears to be strongly affected by the proportion of missing data according to the uncertainty map (Fig. 7), where increased prediction variability is apparent in the alongtrack direction. This observation was supported statistically, wherein the proportion of missing (and therefore imputed) data was significantly and positively predictive of the model variability (Online Resource S2).
Machine learning approaches are attractive for analysing angular and multi-frequency MBES datasets given their capacity for handling multidimensionality. Neural networks, for example, tune model weights to regularize or ignore uninformative predictors using backpropagation, which is computed based on a loss function that describes the error between the model and response data. This is highly advantageous for cases with many variables, some of which may be collinear and/or uninformative for the mapping purpose. At the Bedford Basin, previous research has identified subsurface dredge spoil covered by mud (visible in the northwest of Fig. 3a; Fader and Miller 2008;Brown et al. 2019). These subsurface features are clearly detected using the 100 kHz signal, yet show a lower homogenous return with the 400 kHz frequency (Brown et al. 2019). Here, although the subsurface dredge spoil was detected by the 100 kHz signal, and also the 100-400 kHz depth difference measurements (Fig. 4d), this information was not mapped to the prediction of surficial grain size or sediment class-it was effectively ignored by the models, which were trained exclusively using surficial samples (Figs. 6, 7). By generating predictors at a range of frequencies and incidence angles, the amount of potentially useful information is increased, and the model, rather than the analyst, determines the relevance of predictors based on the response variable.

Conclusions
Multiple imputation is a promising approach for achieving continuous estimates of angular response data at a high spatial resolution. Simulations in this study suggested that the proportion of missing angular data resulting from a fine object-based segmentation could be imputed with little error, which is attributable to the high collinearity of angular acoustic data. The implementation of Random Forest as an imputation model allows for automatic variable selection and interaction while retaining uncertainty of the imputed values. This is ideal for highly dimensional angular acoustic data.
The full imputed datasets of multi-frequency angular measurements were effective for producing continuous maps of seabed substrate properties at a high resolution. Machine learning approaches are well suited to modelling imputed multi-frequency angular MBES data, providing an alternative to variable selection or dimensionality reduction. We found no indication that the imputed data decreased the performance of seabed classification models of bottom type or regression of grain size parameters. The latter, though, were predicted with increasing uncertainty as the proportion of missing data increased for a given acoustic segment. Increased uncertainty at areas of sparse sounding density is