An Interpretable Multi-Model Machine Learning Approach for Spatial Mapping of Deep-Sea Polymetallic Nodule Occurrences

Gazis, Iason-Zois; Charlet, Francois; Greinert, Jens

doi:10.1007/s11053-024-10393-7

An Interpretable Multi-Model Machine Learning Approach for Spatial Mapping of Deep-Sea Polymetallic Nodule Occurrences

Original Paper
Open access
Published: 07 August 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Natural Resources Research Aims and scope Submit manuscript

An Interpretable Multi-Model Machine Learning Approach for Spatial Mapping of Deep-Sea Polymetallic Nodule Occurrences

Download PDF

609 Accesses
Explore all metrics

Abstract

High-resolution mapping of deep-sea polymetallic nodules is needed (a) to understand the reasons behind their patchy distribution, (b) to associate nodule coverage with benthic fauna occurrences, and (c) to enable an accurate resource estimation and mining path planning. This study used an autonomous underwater vehicle to map 37 km² of a geomorphologically complex site in the Eastern Clarion–Clipperton Fracture Zone. A multibeam echosounder system (MBES) at 400 kHz and a side scan sonar at 230 kHz were used to investigate the nodule backscatter response. More than 30,000 seafloor images were analyzed to obtain the nodule coverage and train five machine learning (ML) algorithms: generalized linear models, generalized additive models, support vector machines, random forests (RFs) and neural networks (NNs). All models ML yielded similar maps of nodule coverage with differences occurring in the range of predicted values, particularly at parts with irregular topography. RFs had the best fit and NNs had the worst spatial transferability. Attention was given to the interpretability of model outputs using variable importance ranking across all models, partial dependence plots and domain knowledge. The nodule coverage is higher on relatively flat seafloor ( < 3°) with eastward-facing slopes. The most important predictor was the MBES backscatter, particularly from incident angles between 25 and 55°. Bathymetry, slope, and slope orientation were important geomorphological predictors. For the first time, at a water depth of 4500 m, orthophoto-mosaics and image-derived digital elevation models with 2-mm and 5-mm spatial resolutions supported the geomorphological analysis, interpretation of polymetallic nodules occurrences, and backscatter response.

Compact-Morphology-based poly-metallic Nodule Delineation

Article Open access 17 October 2017

Automated real-time prediction of geological formation tops during drilling operations: an applied machine learning solution for the Norwegian Continental Shelf

Article Open access 08 April 2024

Acoustic impedance prediction based on extended seismic attributes using multilayer perceptron, random forest, and extra tree regressor algorithms

Article Open access 08 May 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Deep-sea polymetallic nodules (hereafter nodules) are mineral concretions of mm to cm in diameter, embedded in the sediment surface of the ocean floor, mainly between 4000 and 6000 m water depth. They are formed by precipitation of Fe– and Mn-oxyhydroxides from seawater and sediment porewater, creating repeated concentric encrustations around a core (e.g., nodule fragment, rock debris, and shark's tooth) over millions of years (Kuhn et al., 2017; Hein et al., 2020). Within the Clarion–Clipperton fracture zone (CCFZ) in the eastern equatorial Pacific Ocean, nodule abundances are typically 10–15 kg m⁻² (wet weight). Nodules are rich in Mn (28.4 wt%) and Fe (6.16 wt%) while they also have high contents in Ni, Co, and Cu, which are usually > 2.5 wt% for these three metals combined (Kuhn et al., 2017; Hein et al., 2020). Due to their abundance, metal content, and the currently increasing demand for those metals (e.g., batteries for electric vehicles), nodules are of commercial interest, and 18 exploration licenses in the CCFZ have been issued and supervised by the International Seabed Authority (ISA, 2023). Apart from a potential metal resource, nodules are central to the deep-sea ecosystem as they are one of the few hard substrates on vast regions covered by otherwise soft siliceous clay. Several studies have linked nodule abundance (kg m⁻²), seafloor area coverage (% m⁻²), and nodule size (cm) to the abundance, spatial distribution, species richness, and community composition of benthic fauna (Amon et al., 2016; Vanreusel et al., 2016; Simon-Lledó et al., 2019a, b, 2020; Pape et al., 2021; Mbani et al., 2023; Uhlenkott et al., 2023). Because nodules are a valuable metal resource and deep-sea habitat, the study of their spatial distribution has been advanced.

In regional scales (hundreds to thousands km), ship-based multibeam echosounder systems (MBESs) and deep-towed side scan sonar (SSS) systems have been used to discriminate areas of nodule presence and absence using the backscatter response, typically at 12 kHz (Scanlon & Masson, 1992; Lee & Kim, 2004; Chunhui et al., 2015; Machida et al., 2019; Kuhn & Rühlemann, 2021; Wang et al., 2021). These data are linked with sparse (usually > 1.8 km apart) ground truth samples, e.g., box corers (sampling area of 0.25 m²) and geostatistical techniques (e.g., kriging) to provide nodule distribution and abundance (Kuhn & Rühlemann, 2021). Within contract areas, high-resolution seafloor mapping with autonomous underwater vehicles (AUVs) has revealed the fine-scale (meters to few km) spatial distribution of nodules (Schoening et al., 2017; Gazis et al., 2018; Peukert et al., 2018; Gazis & Greinert, 2021; Alevizos et al., 2022) and associated fauna (Simon-Lledó et al., 2019a,b; De Smet et al., 2021). The large amounts of hydroacoustic and optic data collected during such surveys have driven the application of machine learning (ML) at local (Gazis et al., 2018; Gazis & Greinert, 2021; Wong et al., 2021; Mbani et al., 2022), regional (Hari et al., 2018; Kaikkonen et al., 2019; Kuhn & Rühlemann, 2021; Wasilewska-Błaszczyk & Mucha, 2021) and global scales (Dutkiewicz et al., 2020). ML provides reliable predictions for geographical areas where good correlations between the response variable and covariates/predictors exist, and an adequate feature space representation for the training data is guaranteed (Gazis & Greinert, 2021).

Regional and local seafloor parameters influence directly the nodule spatial distribution, e.g., bathymetry (Craig, 1979; Kodagali, 1988; von Stackelberg, 2000), terrain ruggedness (Kodagali, 1988; Skornyakova & Murdmaa, 1992), outcropping bedrock (Dreisetl, 2016; Yoo et al., 2018; Alevizos et al., 2022), and volcanism and availability of nuclei (Glasby et al., 1973). Synergistically, the seafloor morphology alters the local hydrodynamics, e.g., when solitary knolls or seamount chains influence the bottom current magnitude and direction, causing erosive and depositional structures such as furrows and sediment drifts, respectively (Juan et al., 2018). The slope orientation (i.e., aspect) has been used as a proxy for the mean current direction in spatial models when continuous long-term information about bottom currents is unavailable (Gazis & Greinert, 2021; Uhlenkott et al., 2022). Alternations in local sediment depositional patterns that influence sediment thickness (e.g., seafloor pits with increased sedimentation and conical hills with sediment accumulation at the lee side) contribute to the patchy nodule distribution (Wiedicke & Weber, 1996; von Stackelberg, 2000; Juan et al., 2018; Gazis & Greinert, 2021). Such variations in sediment thickness are reflected in the MBES backscatter intensity, angular range analysis (ARA), and SSS backscatter mosaic (Yoo et al., 2018; Hillman et al., 2017; Gaida et al., 2019), while they can be inferred from topographic indices such as the bathymetric position index (BPI) (Gazis & Greinert, 2021; Alevizos et al., 2022) and can be measured in geophysical data obtained by sub-bottom profiler (SBP), e.g., thickness of semi-liquid bottom sediment layer (Cochonat et al., 1992; Yoo et al., 2018; Alevizos et al., 2022). The synergistic effect of seabed topography, bottom currents, and heterogeneous sedimentation patterns influences the geochemical parameters that affect nodule formation, e.g., removal of clay-size particles by currents, variations in total organic carbon content and shifts on Mn-redox depth in local depressions, highlighting the relationship between meter-scale seabed morphology and variation in geochemical parameters (Mewes et al., 2014; Volz et al., 2018; Paul et al., 2019).

Despite the ability to derive accurate predictions based on data that capture the underlying relationships, different ML algorithms result in divergent prediction performance even when using the same dataset (Stephens & Diesing, 2014; Diesing & Stephens, 2015; Robert et al., 2016; Li et al., 2017). Multi-model approaches could help decrease the prediction error, identify models that better generalize over broad geographical areas or a more complex feature space, and increase confidence in locations with good spatial agreement (Wenger & Olden, 2012; Diesing & Stephens, 2015; Robert et al., 2016). To our knowledge, the multi-model approach has not been used until now for fine-scale spatial prediction of nodules. Here, we use five well-established supervised ML algorithms: generalized linear models (GLMs), generalized additive models (GAMs), support vector machines (SVMs), random forests (RFM), and neural networks model (NNM). These algorithms represent five of the most common modeling architectures with different degrees of model complexity; they are available in all standard data analysis programming languages (e.g., R, Python) and some GIS software (e.g., ArcGIS, QGIS).

While prediction accuracy on an independent dataset is the standard way to assess model performance, it is equally important to understand how model predictions occurred. Studies have shown that accurate model predictions can be achieved even when irrelevant and artificial predictors are used, outperforming models with domain-relevant predictors (Fourcade et al., 2018; Behrens & Viscarra Rossel, 2020). A tradeoff between interpretability and accuracy is thus needed, particularly in natural sciences, where the models are used for causal explanation (Breiman, 2001b; Shmueli, 2010). However, interpreting ML results is difficult, especially when our knowledge or conceptual understanding is still developing and complex methods (e.g., neural networks) are used exclusively (Merow et al., 2014). In a multi-model approach, the contribution of each predictor can be evaluated across different models, providing better insights into their importance (Li et al., 2017). Besides variable importance measures are partial dependence plots (PDPs), which depict the relationship type (linear, monotonic, or complex) built between a response and the predictor variables during model training (Friedman, 2001). PDPs are one of the most widely used model-agnostic methods in regression problems with large datasets due to their simple interpretation and computation efficiency (Molnar, 2018).

In addition to the multi-model approach followed, an ARA was done to better interpret the contribution of MBES backscatter as a spatial predictor. The backscatter angular response for a specific frequency and incident angle is an inherent property of the seafloor (Jackson & Briggs, 1992; Fonseca & Mayer, 2007). The ARA has a higher angular resolution than the processed backscatter mosaic (where several angles from different swaths have been blended and AVG filtering has been applied), but it has a decreased along-track resolution because it is a stack of 30 consecutive MBES pings for the half swath width on port and starboard side (Fonseca & Mayer, 2007; Fonseca et al., 2009). Despite the lower along-track resolution, ping stacking at different beam angles could help to identify the beam angles that contribute the most to the spatial predictions and reveal subtle differences between seafloor patches with different sediment properties. Beyond the extraction of average angular response at certain incident angles, ARA attempts to characterize sediment properties by comparing the average angular response for each group of incident angles with mathematical models that have linked backscatter and sediment properties (Jackson et al., 1986; Fonseca & Mayer, 2007). Based on this model inversion, the backscatter impedance (hereafter impedance) and backscatter volume (hereafter volume) were calculated. Impedance is the product of sediment bulk density and sound velocity ratio between sediment compressional wave speed and water sound speed, showing the acoustic contrast between water and sediment surface (Fonseca & Mayer, 2007). Sediments with lower water content and typically larger grain size provide higher impedance and stronger backscatter signal (Jackson & Briggs, 1992). Volume scattering provides information on the sound attenuation within the sediment concerning sediment physical properties (e.g., density) and inhomogeneities within the sediment column (e.g., buried nodules or fragments of them, burrows, and gas bubbles). In fine-grain size sediments (e.g., deep-sea clay and silt) with low density and roughness, volume scattering could be the dominant contributor at lower frequencies, intermediate oblique incident angles (> 15° and < 60°) and when a bioturbated or harder/rougher layered substrate exists below the surficial thin, soft layer, e.g., thickness variations in the semi-liquid bottom sediment layer (Jackson et al., 1986; Lurton & Lamarche, 2015).

To enhance the geomorphological analysis and better understand the spatial distribution of nodule coverage, we constructed high-resolution orthophoto-mosaics of the seafloor at a spatial resolution of 2 mm pixel, which has not been acquired elsewhere in such detail and water depth until now. In addition, image-derived digital elevation models (DEMs) from seafloor parts with and without nodule coverage supplemented the analysis and interpretation of MBES and SSS backscatter.

Study Area

The study area was in the Eastern CCFZ, within the B4 domain of the global sea mineral resources (GSR) contract area (Fig. 1). This stretch of the deep-sea, with an area of 37.34 km² and − 4520 to − 4392 m water depth, is one of the study sites of the in situ mining trials of the Patania II pre-prototype seafloor nodule collector vehicle (Vink et al., 2022). It is part of a larger geomorphological system of abyssal hills and valleys (horsts and grabens, respectively) aligned to the N–S/NW–SE axis, parallel to each other (Supplementary Fig. S1), a typical structure in the eastern CCFZ (Macdonald et al., 1996; Parianos et al., 2022).

The central part is a wide N–S oriented U-shape valley limited to the south-west by a volcanic knoll with a crater on top. To the south-east, it is limited by a horst, and to the south, there is a sill with slopes of up to 5° and two cone-shaped morphological features (Fig. 1, Supplementary Fig. S1). Several sub-circular and irregularly shaped depressions exist inside the central basin, creating an uneven seafloor. The largest sub-circular depression has a diameter of ~ 400 m and a water depth of − 4520 m, the deepest part of the study area (Fig. 1, Supplementary Fig. S2). Based on the box corer samples retrieved from the central part (Haeckel & Linke, 2021), the seafloor is covered by unconsolidated, very soft, yellowish to medium grayish-brown clayey silt with traces of fine sand and biogenic siliceous components, being homogeneous at the first 5 cm (Supplementary Fig. S2). The sediment had an increased water content (55.63–72.28%) and bulk density of 1.35 t m⁻³ (GSR, 2018).

The nodules were mainly of diagenetic origin with Mn/Fe ratio > 5 (Halbach et al., 1981; GSR, 2018). Their shape was ellipsoidal, with the major and minor axes having a linear relationship (Schoening & Gazis, 2019; Yu & Parianos, 2021). Their texture was smooth or botryoidal on top, separated by a visible equatorial belt from the rough buried bottom part (Yu & Parianos, 2021). The nodule density was 1.99 g cm⁻³ (GSR, 2018). As regards the central part of the study area, the median abundance, including surficial and buried nodules, was 22.4 kg m⁻² of wet weight (Schoening & Gazis, 2019). The percentage of buried nodules (first 10 cm in the sediment column) accounted for 14.3% of the total number of nodules within the box corer samples (Schoening & Gazis, 2019). The % seafloor nodule coverage (nodule coverage) based on the few box corer samples ranged from 4 to 43%, with a median of 38% (Schoening & Gazis, 2019). This nodule coverage range agrees with the nodule coverage derived from the seafloor images within the central part (Fig. 2). The nodules were large individuals, not interconnected to other nodules (Fig. 2). Lower nodule coverage and smaller nodules existed around local depressions. Within the local depressions, the nodule coverage was < 3% (Fig. 2). Mounds of sediment accumulation without nodules on top caused by bioturbation (i.e., tumuli) co-existed between the nodules throughout the study area (Supplementary Fig. S3). Traces of mobile surface fauna and bioturbated sediments were found in seafloor images, box corer, and gravity corer samples (e.g., sinuous burrows).

The volcanic knoll's northern and eastern/southeastern parts have gentle slopes (3°) and relatively smooth terrain. The northwestern part had steeper slopes (5–7°) and downslope channels, indicating sediment movement downslope (Fig. 1, Supplementary Fig. S2). Lower nodule coverage and smaller nodules were observed here (Supplementary Fig. S4). The main crater was 50–60 m deep (Supplementary Fig. S2). Based on seafloor images, there were nodule-free areas, consolidated sediments, and rocky outcrops around and inside the crater (Fig. 3).

The horst at the south-east part of the study area had a length of 2.3 km and maximum width of ~ 1 km. Its western side was nodule-covered and had a maximum slope of 10°, which forms a topographic high at 4392 m water depth, the shallowest part of the study area (Fig. 1, Supplementary Fig. S2). The opposite east side of the horst structure was bounded by inward-dipping normal faults with a sediment-free vertical cliff (60–80°). The base of the cliff was an enclosed U-shape collapsed trough (>80 m altitude difference), related to a combination of faulting and carbonate dissolution (Mayer, 1981; Fig. 1). Its width varied from ~ 100 m in the northern part to ~ 700 m in the southern part (Fig. 1). Different nodule facies existed here: In the northern part, the seafloor had smaller nodules with high spatial density. This part was characterized by exposed compacted and layered indurated sediments at successive steps. On top of the exposed parts, the loose sediment was winnowed away, and small nodules lie (Fig. 3). Such structures within deep-sea troughs have been mapped in the past in CCFZ (Mayer, 1981). In the southern part of the trough, the seafloor was covered by densely packed small nodules, debris, pillow basalts, basaltic fragments, and rocky outcrops (Fig. 3).

The highest nodule coverage (up to 50%) was observed at the southeastern edge of the study area (Fig. 2). Geomorphologically, prominent sites have been linked to high nodule coverage because of abundant nuclei material that existed as fragments or was gravitationally transferred downslope when steep slopes exist (Yamazaki & Sharma, 2000; Sharma, 2017; Li et al., 2021). The nodules here were smaller than in the central and flatter part of the study area (Fig. 2), a fact that could be attributed to one or several factors: the limited available horizontal space relative to a high number of nuclei, younger age, primary hydrogenetic growth, and limited supply of dissolved metal through diagenetic precipitation due to thinner or absent semi-liquid bottom sediment layer (i.e., geochemically active layer) due to frequent basement exposure (Cochonat et al., 1992; Lipton et al., 2016). Further to the southeast, three pits (or potholes) of varying depths (10–50 m) were presented (Fig. 1, Supplementary Fig. S1). They had sharp walls (locally steeper than 60°) with layered consolidated sediments (Supplementary Fig. S5). Pits existed along all the main horsts within the broader study area, having varying depths and sizes (Supplementary Fig. S1). Their presence has been linked to the dissolution of outcropping carbonate and Miocene limestone because the study area is at the limit of the calcite compensation depth (CCD; Berger et al., 1976) augmented by turbulent erosion and hydrothermal discharge through conduits created by faulting and differential compaction between the asymmetrical flanks (Mayer, 1981; Bekins et al., 2007; Moore et al., 2007).

Methodology

MBES Data Acquisition and Post-Processing

The used HUGIN 6000 AUV was equipped with a Kongsberg EM2040 MKII operating at 400 kHz, with beam opening angles of 0.7° × 0.7°. The swath opening angle was 120°, and the flying altitude was kept constant at 60 m above the seafloor, resulting in uniform swath width and ~ 50% overlap between neighboring lines. A detailed description of the acquisition system settings (e.g., beam spacing pattern, pulse mode, and ping rate) is provided in the expedition report (Vink et al., 2022). The raw data were stored in the Kongsberg .all format. The navigation data were post-processed onboard using Kongsberg's proprietary NavLab software. The post-processing was done with the QPS Qimera v1.7 software and included sound velocity profile ray tracing correction, roll bias correction, and tide correction based on pressure changes of two fixed deployed SBE16 CTDs (Vink et al., 2022), AUV turn removal, automated CUBE filtering (Calder & Mayer, 2003), and manual removal of erroneous soundings. The generated bathymetric grid was exported as GeoTIFF with 2 m × 2 m cell size in Universal Transverse Mercator coordinates zone 10N (hereafter UTM 10N). The absence of navigational drift was confirmed using the available ship-based MBES data (Gazis, 2020) as a reference layer.

The processed lines were exported to MBES generic sensor format (.gsf) and imported to the QPS Fledermaus Geocoder Toolbox (hereafter FMGT) v7.8 for the backscatter post-processing. The FMGT built-in workflow was followed (i.e., radiometric correction, slope correction based on the exported bathymetric grid, angle varying gain (AVG) correction, nadir filtering, anti-aliasing, despeckling, and mosaicking) to create the backscatter mosaic (Fonseca & Calder, 2005). The backscatter mosaic was exported as a GeoTIFF grid with 2 m × 2 m cell size in UTM 10N. In addition to a backscatter mosaic, the ARA in the near range (incident angles within 0–25°), far range (incident angles within 25–55°), and outer range (incident angles within 55–85°) were exported as GeoTIFF grids in UTM10N. The ARA beam details (i.e., ping number, beam number, easting, northing, depth, backscatter value, corrected backscatter value, and true angle) were exported in .txt format for specific MBES lines for comparing patches with variations in backscatter intensity and nodule coverage. Based on the ARA model inversion, the acoustic impedance (hereafter impedance) and volume scattering (hereafter volume) were calculated and exported as GeoTIFF grids in UTM10N.

SSS Data Acquisition and Post-Processing

An AUV-mounted EdgeTech 2205 SSS was used to acquire seafloor backscatter data at 230 kHz, parallel to MBES mapping (Vink et al., 2022). No interference was observed between the two systems. The raw data were stored in EdgeTech .jsf format (in which the post-processed navigation was integrated) and processed with the SonarWiz v7.09 software. The post-processing included bottom tracking based on the provided threshold detection method, slat range and bottom slope correction, gain correction using the empirical gain normalization method, nadir correction, and turn removal. The high overlap (~60%) between the neighboring lines resulted in the absence of nadir gaps in the final mosaic. The final mosaicking was done using the average blending method. The SSS backscatter mosaic extent was clipped to the MBES grid extent and exported as a GeoTIFF grid in 2 m × 2 m cell size in UTM10N.

MBES and SSS Derivatives

Following results from previous studies (e.g., Gazis & Greinert, 2021), 19 derivatives were produced (Table 1). Before calculating the derivatives, the mining site area and adjacent periphery (~ 450 × 430 m) were clipped out (in QGIS v3.24) as the anthropogenic nodule removal and seafloor disturbance alternated the natural bathymetric and backscatter values. The three source grids (MBES bathymetry, MBES backscatter, SSS backscatter) were filtered using a Gaussian filter (in SAGA GIS v8) with standard deviation (SD) of 1 and a kernel radius of 2 pixels to eliminate any remaining single-cell outliers that could affect negatively the derivative calculation and modeling performance (Lucieer et al., 2016; Lecours et al., 2017). Afterward, the derivatives were produced. The aspect was transformed into eastness and northness (Florinsky, 2017). None of the derivatives had zero variance (Supplementary Table S1) or was perfectly correlated (r > 0.85, p < 0.05) with another derivative (Supplementary Fig. S6). The maximum correlation was among the ARA outputs. Because the ARA contribution to modeling performance was one of the study's interests, all ARA outputs were kept. The caret (Kuhn, 2022) and GGally (Schloerke et al., 2023) R packages were used for the zero variance and correlation analyses.

Table 1 MBES and SSS derivatives used for the spatial modeling

Full size table

Image Data Acquisition and Post-Processing

High-resolution seafloor images (4096 × 3072 pixels) were acquired by the AUV HD CathX Colour Still Camera (M12 A1000) and stored in JPEG format (Vink et al., 2022). Like the hydroacoustic data, the AUV navigation from the photo-surveys was post-processed using Kongsberg's proprietary NavLab software to define each image location, camera altitude and the photographed seafloor image area. The nodule coverage was calculated using the saltation GmbH & Co. KG proprietary H²SOM software (Supplementary Fig. S7), which automatically detects and segments polymetallic nodules in underwater images using a hierarchically growing hyperbolic self-organizing map approach (Schoening et al., 2016). Other researchers also have used the H²SOM software to quantify the nodule coverage in seafloor images (Ellefmo & Kuhn, 2021; Kuhn & Rühlemann, 2021). The fine-tuning of the software parameters for this dataset was done by saltation GmbH & Co. KG. During the image acquisition, the AUV flying altitude was set at 5 m above the seafloor, which could not always be kept constant due to altimeter bottom detection warnings, which forced the AUV to fly higher in some parts of the survey lines (up to 15 m in some cases). Image evaluations showed that the photos obtained above 7 m could not offer meaningful information and thus were excluded. Images along the AUV turns and those outside the MBES grid extent were also excluded. In total, 31,409 images were used to extract the nodule coverage for model training and testing.

Although not a direct topic of this study, the conversion of nodule coverage from seafloor images (%) to nodule abundance (kg m^-2) was not possible due to the absence of needed information to establish local equations between the seafloor images and the box corer data. Apart from the available image-derived nodule coverage, the image-derived size distribution and box corer-derived nodule abundance, coverage, and size distribution were needed at the seafloor image locations. In addition, the seafloor images tended to underestimate the actual nodule coverage due to semi-buried nodules or nodules covered by faint sediment (Sharma, 1993; Ellefmo & Kuhn, 2021; Tsune, 2021). The degree of underestimation varied among different nodule sizes, facies and substrates, being higher in sub-areas with thicker sediment cover and lower in sub-areas with rocky substrate and basement exposure (Sharma, 1993, 2017; Parianos et al., 2021). Finally, seafloor images cannot provide information on the buried nodules (Sharma et al., 2013; Lipton et al., 2016; Sharma, 2017), and correction factors based on box corer data must be applied. Nevertheless, studies have related higher nodule coverage with higher nodule abundance in areas without surficial sediment on nodules—like our study site (Lipton et al., 2016; GSR, 2018; Wasilewska-Błaszczyk & Mucha, 2021). Thus, image-derived nodule coverage and model predictions (see Results section) could guide future box corer sampling at seafloor image locations and sub-areas with different nodule coverage.

Orthophoto-Mosaics and Image-Derived Digital Elevation Models

The seafloor images were taken at 5 Hz, resulting in a high overlap (~75%) in the along-track direction, which allowed the computation of high-resolution orthophoto-mosaics and image-derived DEM. Before orthophoto-mosaicking, the original seafloor color was restored in underwater images using the in-house software Image Normalization (Köser et al., 2021), which removes lighting artifacts and illumination cones (Supplementary Fig. S7). Orthophoto-mosaics and the respective DEMs were calculated using the Agisoft Metashape Pro v8.3 software, following the built-in workflow (Camera calibration, Image alignment, Dense Point Cloud creation, Dense Point Cloud Confidence filtering, DEM calculation, and orthophoto mosaic reconstruction). The orthophoto-mosaics and DEMs were exported as GeoTIFF grids with 2 mm × 2 mm and 5 mm × 5 mm pixel/cell sizes, respectively.

Spatial Modeling

Data Pre-Processing

The predictor values of the MBES and SSS data sets were extracted from each image location using a bilinear interpolation of the four adjacent cells to reduce the effect of AUV positioning uncertainty. The data were split into training (80%) and testing (20%) datasets using stratified sampling without replacement (i.e., no duplicate samples between training and testing datasets), ensuring good representativeness of the univariate density distribution and spatial coverage between the two datasets. The splitTools (Mayer, 2022) R package was used for this purpose. The testing dataset was kept from further steps such as correlation analysis, data transformation, cross-validation (CV), and model training to avoid data leakage.

Data Transformation

As the nodule coverage (response variable) was given in % relative to the observed seafloor, the arcsine transformation was used to avoid extrapolation below 0% or above 100%. In addition, it enhanced the model comparison and interpretability at the extreme ends because it stretched out the data in these two areas. The transformation was done in R using the formula: \(y=\text{arcsin}\left(\sqrt{x}\right)\), where x is a real number from 0 to 1, representing the nodule coverage in percentage. The model predictions were back-transformed using the formula: \(x={\text{sin}(\text{y}))}^{2}\), and multiplied by 100 to return the predicted percentages of nodule coverage. The performance metrics on the testing dataset were calculated using the back-transformed data.

ML Algorithms

The following regression algorithms were applied using the caret package in R (R Core Team, 2022):

GLM is an extension of the linear regression model that allows for incorporating non-normally distributed data (e.g., Poisson, binomial) using link functions that relate the predicted response variable to predictors (McCullagh & Nelder, 1989). Here, we used a GLM augmented with component-wise gradient boosting (Bühlmann & Hothorn, 2007) as implemented in the mboost (Hothorn et al., 2021) R package.

GAM is a data-driven approach that divides data into sections (knots), fitting independent smooth functions (e.g., splines) for each section, assuming that the built functions are additive. It can handle non-linear and non-monotonic relationships between the response and predictors without assumptions regarding the distribution of the response variable and predictors (Hastie & Tibshirani, 1986). Like the GLM approach, a component-wise gradient-boosted GAM that uses penalized B-splines (a.k.a. P-splines; Eilers & Marx, 1996) as a base learner was applied using the mboost R package.

RFM is an ensemble of regression trees that grow independently using a random subset of the training data and a random subset of predictor variables (Breiman, 2001a). Thus, correlations among trees are minimized. In each tree, the 'parent' nodes are split using the best subset of predictors that minimizes the variance of the 'child' nodes until the maximum defined depth is reached. The final prediction is the average prediction of all trees (in regression problems). The ranger R (Wright & Ziegler, 2017) package was used.

SVM is a non-parametric technique that uses kernel functions to divide the hyperparameter data space into subspaces (hyperplane), aiming for good separation of the data (Cortes & Vapnik, 1995). The kernel functions can be linear, polynomial, sigmoidal or radial basis and can project data from a low-dimensional space to a higher dimensional space, the so-called 'kernel trick' (Karatzoglou et al., 2004). The prior choice of kernel function is arbitrary, and several kernel functions should be tested as the performance may vary depending on the data size and relationships among data. The kernlab (Karatzoglou et al., 2023) R package was used.

NNM is a data-driven approach to extract patterns (i.e., relationships) from data using algorithms (e.g., forward propagation, backpropagation) and activation functions (e.g., logistic, tanh). It creates weighted associations between predictors and response variables at the hidden layers, which receive information from all nodes of the previous layers and transfer it to the activation function and next layer(s). Non-linear activation functions allow the network to deal with non-linearities between predictors and response variable (Lippmann, 1987). We used the neuralnet (Günther & Fritsch, 2010) R package to develop an NNM that uses resilient backpropagation with weight backtracking (a.k.a., rprop + ; Riedmiller & Braun, 2003; Riedmiller, 1994).

Hyperparameter optimization (aka tuning) was done for all models. GLM tuning included the number of boosting iterations (Supplementary Table S2). GAM tuning included the number of boosting iterations and the degrees of freedom in P-splines (Supplementary Table S3). RFM tuning included the number of predictor variables randomly sampled at each node split, the split rule and the minimum node size (Supplementary Tables S4–S5). SVM tuning optimized the kernel type, polynomial degree, scale and cost hyperparameters (Supplementary Tables S6–S7). The number of hidden layers, neurons and activation functions were optimized in NNM (Supplementary Tables S8–S9). An exhaustive hyperparameter tuning was out of the scope of this paper. The optimization process occurred during the spatial CV, which is the preferred method when data are spatially autocorrelated (Schratz et al., 2019; Gazis & Greinert, 2021).

The variable importance was extracted for all models. Each model estimated variable importance differently: In GLM, the absolute value of the coefficients from the final model was used (Kuhn, 2022). In GAM, the in-bag risk reductions per boosting iteration (error difference between the current and the previous step for each base learner contained in the model) were calculated (Hothorn et al., 2021). In RFM, the average permutation variable importance from all trees was used (Breiman, 2001a). In SVM, there is no model-specific way to calculate the importance. Instead, we used a permutation approach, where the prediction error was measured before and after shuffling the values of the predictor variable. The larger the prediction error after shuffling, the higher the importance of this predictor variable; the iml (Molnar, 2018) R package was used. In NNM, we used the model weights assigned to each predictor to measure variable importance; the weights were calculated based on the Garson algorithm (Garson, 1991; see also Goh, 1995), as it is implemented in Neural Net Tools (Beck, 2018) R package. All variable importance outputs were scaled between 0 and 100 for better comparison across the different models (Kuhn, 2022). PDPs were calculated using the iml and pdp (Greenwell, 2017) R packages. All needed graphs were produced using the ggplot2 (Wickham, 2016) R package.

CV Strategy and Extrapolation Assessment

ML models can yield over-optimistic results when a non-spatial, random k-fold CV is used for spatially autocorrelated data (Roberts et al., 2017; Meyer et al., 2018; Ploton et al., 2020; Karasiak et al., 2022). Here, a spatial k-folds CV strategy was applied. The folds' dimensions (and consequently, number) are a tradeoff among spatial autocorrelation range, data representativeness, and model extrapolation (Roberts et al., 2017; Valavi et al., 2019). Small folds could not eliminate the influence of spatial autocorrelation in the training locations (over-optimistic predictions), while too large folds can result in the removal of a big portion of training data, which creates feature space gaps and causes unnecessary extrapolation that reduces modeling performance (over-pessimistic results). We used an incremental Moran's Index analysis to identify the maximum distance of spatial autocorrelation. This analysis was done in ArcMap v10.6 according the provided equations, the null hypothesis (i.e., the examined attribute is randomly distributed) and the significance level (p < 0.05). The conceptualization of spatial relationship was based on the inverse Euclidian distance, and the spatial weights were standardized, eliminating any bias that could be induced due to the different number of spatial neighbors (Mitchell & Griffin, 2021). Three different numbers of non-overlapping spatial folds were created and compared: 1 × 1 km (28 blocks), 3 × 3 km (5 blocks) and 5 × 5 km (2 blocks; Supplementary Fig. S8).

For each of the three spatial block configurations, the density distributions of geographical and feature space distances were compared among the training dataset, the CV folds and the total geographical and future space area used for model predictions aiming to identify the best size of the spatial blocks (tradeoff between the minimization of spatial autocorrelation and need for extrapolation). This comparison was made using the CAST (Meyer et al., 2024) R package, which implements the abovementioned method proposed by Meyer and Pebesma (2022). The degree of how well a model extrapolates in space was assessed using the area of applicability (AOA) method (Meyer & Pebesma, 2021). This method outputs the geographical areas where the multivariate feature space conditions are similar to those captured by the model, and the CV predictive performance still holds. The geographical parts with novel feature space conditions were returned as non-applicable areas (Meyer & Pebesma, 2021). The feature space dissimilarity analysis is based on the normalized and weighted (according to model weights, a.k.a. variable importance) Euclidean distances between training data and the total feature space of the entire study area. The AOA was calculated using the CAST R package.

The schematic flowchart that illustrates the main methodological and modeling steps is presented in Supplementary Figure S9.

Results

CV Block Size

The incremental Moran Index analysis showed that nodule coverage was spatially autocorrelated, particularly in distances < 1 km (Moran I statistic > 0.5, p < 0.05). At 3 km distance, the Moran I statistic was < 0.1 (p < 0.05) with decreasing trend. It reached zero at distances over 4.5 km (Supplementary Fig. S8). The analysis showed that, at 1 km, the geographical distances were still smaller than those needed for representative predictions, and the spatial autocorrelation was still considerable, potentially leading to over-optimistic results (Fig. 4). At 5 km, the geographical and feature space distances inside CV blocks were larger than those encountered during prediction, resulting in unnecessary extrapolation (Fig. 4). The 3 × 3 km CV approach showed the best representativeness between the training data and the model space, and it was selected for model training, tuning and evaluation (Fig. 4). All block configurations outperformed the random-CV strategy (Fig. 4).

Predictive Performance, Nodule Coverage and AOA

RFM had the best predictive performance on the testing data, followed by NNM (Table 2). SVM and GAM had the same coefficient of determination (hereafter R²) between model predictions and observations on the testing data. However, SVM had a lower mean absolute error (hereafter MAE) and root mean squared error (hereafter RMSE) than GAM. GLM had the worst fit on the testing data (R² = 0.72; Table 2).

Table 2 Model performance metrics on test dataset for the back-transformed predictions

Full size table

All models reproduced the testing data distribution (e.g., mean, median, interquartile range) well but not the minimum (min) and maximum (max) values of the response variable (Table 3). Despite the best overall performance, RFM yielded the lowest value range, overestimating the lower values and underestimating the higher values (Table 3). NNM, SVM and GAM predictions were closer to the initial min–max range. GLM was the only model that predicted beyond the min value of the testing data, but it also underestimated the upper range (Table 3).

Table 3 Summary statistics of test dataset and model predictions on the test dataset

Full size table

All models yielded similar nodule coverage maps and interquartile ranges (Fig. 5, Supplementary Table S10). Differences in the predicted min–max range were observed as GAM extrapolated only in the lower range of nodule coverage, NNM only in the upper range, and GLM and SVM extrapolated in both directions (Fig. 5). SVM predicted 100% nodule coverage in isolated pixels, which were not observed in the seafloor images or predicted by any other model (Fig. 5). The model predictions beyond the min–max range of training data had limited spatial extent (≤ 0.23% of the total area; Supplementary Fig. S10).

The weighted-average prediction surface from all models showed excellent spatial agreement with the seafloor images from the testing data and box corer samples (Fig. 6). Comparing the models' predictions pixel-by-pixel, we obtained the spatial differences (here expressed as SD of the predicted values). The largest differences were in geographically small sub-areas associated with sharp morphological transitions, bedrock exposure, consolidated sediments, and volcanic knolls. The highest SD was within the trough at the southeast part of the study area (Fig. 6). The combination of deeper parts with steep slopes (linked with lower coverage) and high backscatter reflectance (linked with higher nodule coverage), which is partly attributed here to nodule coverage and partly to hard substrate, led to contradictory predictions: RFM predicted low nodule coverage, while SVM yielded high nodule coverage (Fig. 6).

Regarding model transferability, the spatial-CV approach showed that all models performed well when a large part of the geographical and feature space was kept out, a fact partially attributed to the representativeness of the univariate density distribution of each predictor by the training sample (Supplementary Fig. S11). The CV residuals from all models had weak spatial autocorrelation and were normally distributed (Supplementary Fig. S12). RFM had the best average performance in CV folds, followed by GAM, which had the smallest interquartile of all models (Fig. 7). However, the model performance varied among the spatial blocks, with median being higher than the mean in all models (Fig. 7). This variation was due to the poor performance of all models in the same spatial block (Fig. 7). This spatial block was in the southeast of the study area (Fig. 7), having a complex geomorphology which differed from the rest of the study area. The presence of horst, cliff, trough, exposed indurated sediments and different nodule facies (Figs. 1, 2, 3) resulted in extreme derivative values that forced the models to extrapolate beyond the training feature space, as shown by the feature space dissimilarity analysis and AOA (Fig. 6).

The sub-areas where severe model extrapolation occurred were limited because each model was applicable in more than 91% of the total area (Supplementary Fig. S13). All models had similar AOA, with marginal differences to occur locally (Supplementary Fig. S13). All models applied to 88% of the study area (pixel-by-pixel comparison regarding the pixels where all models applied simultaneously); there was 6% for which only some models were applicable (at least one) and another 6% for which no model was applicable (Fig. 6).

Variable Importance and ML Interpretation

The variable importance calculation showed that BS, ARA_FM and SSS were the three most important predictors for the spatial distribution of nodule coverage (Fig. 8). BS and ARA_FM ranked in the two first positions in all models. The swapping between first and second played among the models was probably due to the correlation (r = 0.8, p < 0.05; Supplementary Fig. S6). SSS was the third most important predictor for two models (GAM, SVM) and scored high in GLM and RFM (Fig. 8). In NNM, the replacement of SSS from SSS_LM, which ranked as the third most important predictor, was observed. SSS and SSS_LM were also correlated predictors (r = 0.7, p < 0.05; Supplementary Fig. S6).

The MBES backscatter mosaic and ARA analyses depicted the differences in nodule coverage (Fig. 9). A distinct difference (> 5 dB) in backscatter intensity from seafloor parts without and with nodules existed (Fig. 9). Moreover, nodule coverage variations in the order of 10% were detectable, highlighting the sensitivity of the used 400 kHz frequency (Fig. 9). All models captured the monotonic relationship between MBES backscatter and nodule coverage similarly but with different degrees of complexity (Fig. 10). Although the backscatter differences were distinct for the entire range of incident angles, they were more prominent in the mean-far angle range from 25 to 55°, i.e., ARA_MF (Fig. 9). The inner and outer incident angles (ARA_MN and ARA_MO, respectively) ranked lower than the ARA_MF. ARA derivatives (ARA_IM and ARA_VO) also contributed to the model performance. The SSS backscatter mosaic also discriminated well between seafloor parts with different nodule coverage (variations in the order of 10%) and increased the model performance (Figs. 9, 10, Supplementary Table S15).

All models, particularly GLM, GAM and SVM, heavily relied on the hydroacoustic backscatter-related predictors (BS, ARA, SSS and their derivatives) for error minimization and predictions. The broad-scale seafloor geomorphology, such as bathymetry (BT), slope (S), broad-scale BPI (BPI100_300) and the eastward slope orientation (E) also related to the spatial distribution of nodule coverage, which became more evident when the hydroacoustic backscatter-related predictors are not used (Fig. 8).

The PDP of BT showed a bathymetric range between − 4425 and − 4500 m, in which the nodule coverage was higher in this study area. Depths beyond that range were related to seafloor depressions or topographic highs where no nodule existed (Fig. 11). Similarly, the PDP of BPI100_300 showed that the nodule coverage was higher where BPI values were around 0, i.e., flat or gently sloping seafloor (Fig. 11). E also had a monotonic relationship with nodule coverage. Higher E values were associated with higher nodule coverage (Fig. 11).

At a rather detailed 2-m pixel size, PrC, PlC, and VRM seemed to make only a small contribution to the spatial distribution of nodule coverage (Fig. 8). However, the image-derived DEM from seafloor parts with and without nodule coverage showed distinct differences in seafloor roughness along seafloor parts, although the sediment characteristics stayed similar (Fig. 12). These roughness differences were well captured and integrated into the MBES and SSS backscatter at 400 and 260 kHz, respectively.

Discussion

The acquisition of high-resolution data with aerial coverage of 37.34 km², the largest AUV-based dataset presented until now in deep-sea nodule research, provided the unique opportunity to study the spatial distribution of nodules in a geomorphologically complex terrain using a spatial resolution ranging between 2 mm (orthophoto-mosaics) and 2 m pixel size (MBES grids).

The combined use of seafloor images, MBES, SSS data, and different ML algorithms showed that the nodule coverage varied significantly (locally up to 100%) even on a meter scale (Figs. 5, 6). Such variations have been found in other parts of eastern CCFZ and Peru Basin (Wiedicke & Weber, 1996; Gazis et al., 2018; Peukert et al., 2018; Gazis & Greinert, 2021; Alevizos et al., 2022) and are related to the interaction of the seafloor topography, sediment thickness, bottom currents, and the spatial variability of biogeochemical processes (Cochonat et al., 1992; Mewes et al., 2014; Sharma, 2017; Volz et al., 2018; Paul et al., 2019; Hein et al., 2020). This study examined only the geomorphological component of this interplay, providing insights into the role of local topography on nodule occurrences.

Higher nodule coverage and larger nodules were on relatively flat seafloor with gentle (< 3°), eastward-facing slopes (Figs. 5, 6, 11). Both variable importance measures (Fig. 8) and PDPs (Fig. 11) showed an increased importance of eastward-facing slopes, which points toward a predominance of westward-moving bottom currents. Long-term observations in the B4 domain of the GSR contract area show that the mean bottom current direction is northwestward (Hayes, 1979; Juan et al., 2018; GSR, 2018). The predominant mean bottom flow toward NW is steady with slow velocity (0–3 cm/s) and only periodically interrupted by tidal currents (NW and SE) and sporadic by short-in-time benthic storms (6–15 cm/s), which are heading SE (Hayes, 1979; Juan et al., 2018; GSR, 2018). The NW steady and slow bottom currents contribute to the creation of favorable conditions for the polymetallic nodule growth (Skornyakova & Murdmaa, 1992; von Stackelberg, 1997; Mewes et al., 2014; Kuhn et al., 2017; Juan et al., 2018; Volz et al., 2018): (a) erosion and removal of fine sediments, leaving coarser particles that serve as nodule nuclei behind; and (b) continuous contact between nodule surface and well-oxygenated bottom water that allows the precipitation of hydroxides in polymetallic nodules. Geochemical studies within the eastern CCFZ have shown higher nodule coverage of larger sizes to sites with less clay-size particles and deeper oxygen penetration depths, particularly for sampling sites within the GSR contract area (Mewes et al., 2014; GSR, 2018; Volz et al., 2018). Older studies in the CCFZ showed higher nodule coverage on flat seafloor parts with weak (~3 cm s^-1) but steady near-bottom currents (Skornyakova & Murdmaa, 1992; Yamazaki & Sharma, 2000; Kuhn et al., 2017; Juan et al., 2018; Yoo et al., 2018; Alevizos et al., 2022). The importance of slope and slope orientation has also been shown in other ML spatial studies within the CCFZ (Kuhn & Rühlemann, 2021) and the Peru Basin at a nodule field with well-pronounced sediment furrows (Gazis & Greinert, 2021).

Smaller nodule coverage was observed mainly at four sub-environments (Figs. 5, 6): (a) in seafloor parts with typically higher bottom current velocities such as gullies, furrows or the foot of slopes; (b) in topographic highs, with steep slopes such as horsts and volcanic knolls; (c) in the deeper parts of basins around and inside sub-circular depressions; and (d) in areas with basement exposure, rocky outcrops, and carbonate cemented sediment layers.

The PDP of bathymetry showed a bathymetric range of ~ 100 m that favored the nodule presence (Fig. 11). Broader-scale studies in the CCFZ and Peru Basin showed the maximum coverage and abundance occurring within a small bathymetric range (50–100 m) in water depths nearly below the local CCD, resulting in similar to our study response curves (Weber et al. 2000; von Stackelberg, 2000; Sharma, 2017). The PDP of BPI100-300 (Fig. 11) showed a small response to nodule coverage (i.e., few to no polymetallic nodules) in local depressions. Studies have shown that local depressions exhibit geochemical conditions different from their surroundings, having: (a) higher amount of fine-grain sediments (clay), which is trapped in the local depressions during sediment transportation (Wiedicke & Weber, 1996); fine sediments cannot act as nuclei for polymetallic nodule formation due to their size and typically small content of mobilized manganese, which hinders polymetallic nodule formation and growth (Mewes et al., 2014); and (b) increased organic carbon content that shifts the Mn-redox closer to the seafloor, where the limited (see point a) diagenetically mobilized manganese is released into the bottom water without supporting the polymetallic nodule formation and growth (Wiedicke & Weber, 1996; Marchig et al., 2001; Paul et al., 2019).

Similar to local depressions, topographic highs also have limited nodule coverage related to increased slopes, reduced sediment thickness (e.g., shallower or outcropping basement), and local variations in sediment geochemical characteristics (e.g., POC fluxes, TOC contents, and Mn-redox depth) as a result of the fine-sediment winnowing and downslope transportation from the locally intensified or deflected bottom currents (Skornyakova & Murdmaa, 1992; von Stackelberg, 1997; Volz et al., 2018). The studies mentioned above showed that even small height differences of < 10–25 m are associated with such changes, which agrees with our findings.

The construction of seafloor orthophoto-mosaics further enhanced the geomorphological analysis and interpretation of nodule occurrences (Figs. 3, Supplementary Fig. S5). Deep sea orthophoto-mosaicking is an emerging field with several challenges, particularly regarding AUV-based camera systems (Song et al., 2022; She et al., 2023; Song et al., 2024), and only a few studies have used AUV-based orthophoto-mosaics from such depth (but in lower resolution) within nodules fields (Simon-Lledó et al., 2019c; Gausepohl et al., 2020). This study's results showed a great potential for deep-sea orthophoto-mosaicking even when it is based on long AUV transects usually done for fauna or nodule coverage mapping.

Although the geomorphological predictors describe the nodule coverage spatial distribution in an interpretable way linked to natural processes, the hydroacoustic backscatter contributed the most in all ML algorithms. MBES BS and ARA_MF were the most important predictors in all ML models (Fig. 8). Older studies have described the effect of seafloor roughness created by nodules on sound waves and the measured backscatter intensity in MBES and SSS (Weydert, 1991; Scanlon & Masson, 1992; Chunhui et al., 2015; Machida et al., 2019). Here, we showed for the first time this mm-scale seafloor roughness from parts with and without nodule coverage in a quantitative way using the image-derived DEM generated during the orthophoto mosaic processing (Fig. 12). MBES and SSS backscatter captured the nodule coverage variations, being able to discriminate not only sub-environments with and without nodule coverage but also sub-environments with varying degrees of nodule coverage (Fig. 9). MBES backscatter and SSS can be used complementarily, as excluding one results in poorer predictive performance (Fig. 9, Supplementary Table S15). This finding agrees with a previous study in which the combined use of both systems increased interpretation and modeling performance (Janowski et al., 2021).

MBES ARA analyses showed that MBES backscatter acquisition is most meaningful for incident angles within 25–55°, a finding that is consistent with the existing guidelines for backscatter-optimized surveys (Lurton & Lamarche, 2015). The MBES backscatter discriminative ability, even over transitional areas, is attributed to the combination of high frequency (400 kHz), low altitude survey (60 m) and small beam opening angles (0.7° × 0.7°), which resulted in a small footprint size (0.73 m × 0.73 m in nadir on a flat seafloor) and short wavelength (~4 mm), which was in the order or even smaller than the nodules' dimensions (mm to cm scale). MBES ARA derivatives impedance (primary) and volume (secondary) contributed to the model performance in all models except GLM (Fig. 8). This finding agrees with other ML spatial studies, where ARA and its derivatives increased the ML prediction performance (Hasan et al., 2014; Alevizos & Greinert, 2018; Menandro et al., 2022; Misiuk & Brown, 2022; Porskamp et al., 2022). The lower contribution of volume backscattering is attributed to the limited penetration depth of the sound waves at 400 kHz, which is typically < 0.5 m for silt/clay sediments (ϕ > 4.5) and < < 0.25 m for larger grain sizes (Huff, 2008; Gaida et al., 2018). The main component of volume scattering is expected to be the buried nodules located at a maximum depth of 5–10 cm.

In future studies, it would be interesting to use backscatter mosaics and ARA from multifrequency MBES because the use of higher and lower frequencies (e.g., 100–600 kHz) and pulse lengths (e.g., μs to ms range) has shown to improve ML predictive performance and interpretation (Menandro et al., 2022; Misiuk & Brown, 2022). A holistic modeling approach should also include geophysical-derived data obtained by SBP, such as the thickness and composition of the semi-liquid bottom sediment layer, which is related to nodule coverage (Cochonat et al., 1992; Yoo et al., 2018; Dreisetl, 2016; Lipton et al., 2016; Alevizos et al., 2022). Similarly, the use of geochemical information would contribute to spatial modeling. This approach is used in spatial studies in shallower depths, where hundreds of well-distributed samples exist and can be interpolated using ML algorithms (e.g., Diesing et al., 2017; 2020a; Spiegel et al., 2024). However, in deep-sea environments, geochemical data rely on sparse local observations of cm diameter (e.g., gravity corers), which are typically clustered around areas of particular interest, hindering their use in spatial models where continuous surfaces are needed as input data to generate spatially continuous output data. Studies have shown increased fine-scale spatial variability of geochemical properties related to the spatial distribution and size of deep-sea polymetallic nodules, highlighting the need for intensifying efforts on deep-sea seafloor sampling (Mewes et al., 2014; Volz et al., 2018; Paul et al., 2019).

Regarding the modeling performance, RFM had the best fit, which agrees with other studies showing its ability to outperform other ML algorithms in prediction error (Stephens & Diesing, 2014; Diesing & Stephens, 2015; Herkül et al., 2017; Li et al., 2017; Fernández-Delgado et al., 2019). RFM is currently the "workhorse" in deep-sea predictive mapping (Lawson et al., 2017; Gazis et al., 2018; Diesing, 2020b; Gazis & Greinert, 2021; Uhlenkott et al., 2022; Josso et al., 2023). However, here, it yielded the smallest range in predicted values due to the averaging step between the predicted values from each tree at the last step of the RFM algorithm (Breiman, 2001a). The non-tree-based core of the other four models (i.e., linear, penalized B-splines, 3^rd-degree polynomial and sigmoid activation function) allowed for a more substantial value range extrapolation (Fig. 5). NNM had the second-best predictive performance on the testing data but the worst in spatial block CV. Less complex models such as GLM and GAM had better resampling performance in spatial block CV than SVM and NNM, indicating a better generalization ability when a big fraction of the training data is kept out (Fig. 7). This fact is attributed to the smoother functions that are more transferable (and interpretable) over larger areas (Wenger & Olden, 2012; Lauria et al., 2015; Stienessen et al., 2021). In addition, NNM is more sensitive to hyperparameter optimization than GLM, GAM, and RFM (Supplementary Tables S2–S9). The same applies to SVM, where the kernel type and other parameters (e.g., polynomial degree and cost function) largely influenced the model performance (Supplementary Tables S6–S7), a finding which agrees with previous comparative studies (e.g., Probst et al., 2019). To our knowledge, this is the first time that NNM and SVM were used for a regression task in seafloor habitat mapping, as literature research only provided applications for seafloor classification (Hasan et al., 2012; Diesing & Stephens, 2015; Alevizos & Greinert, 2018; Trzcinska et al., 2020; Janowski et al., 2021; Breyer et al., 2023).

Independent of the model used, the spatial block CV approach could result in a large CV error (pessimistic approach) due to the gaps created in feature space (Roberts et al., 2017; Hao et al., 2020; Wadoux et al., 2021). The block optimization minimized the geographical and feature space distances between training data and the prediction area to a degree but not entirely (Fig. 4). Nevertheless, this approach was preferred over other spatial CV methods such as spatial leave-one-location out (Meyer et al., 2018), Euclidean buffered distances (Hengl et al., 2018), nearest neighbor distance matching (Milà et al., 2022), and spatial covariance weighting (Misiuk & Brown, 2023) because of dataset size and survey layout (Figs. 1, 7). The abovementioned methods have been designed and tested with fewer samples (< < 1000) distributed throughout an entire area but not in linear transects like here. In contrast, spatial blocking is appropriate for large datasets and large regions, being computationally efficient (fewer training points per model iteration) and providing flexibility in partitioning the geographical space (Roberts et al., 2017; Valavi et al., 2019). The model evaluation based on a spatial block CV was done using held-out data from each spatial block separately (Fig. 7). An alternative approach would be to use all held-out data from spatial blocks at once (global R²), which better represents a model's ability to predict over larger areas (Meyer & Pebesma, 2022). However, we preferred the local R² from each spatial block to get (a) the strictest evaluation for each model (Supplementary Table S14) and (b) the comparison between different spatial blocks, identifying the spatial blocks that model predictions converged or diverged (Fig. 7). The partitioning of the geographical and feature space represents better the error related to extrapolation and transferability although it increases the AOA (Wenger & Olden, 2012; Roberts et al., 2017; Fourcade et al., 2018; Hao et al., 2020; Gazis & Greinert, 2021; Meyer & Pebesma, 2021).

Here, the AOA was large and similar for all models (> 91% of the study area) and enlightened the reasons behind the poor performance of all models in a specific resampling spatial block (Figs. 6, 7). The AOA and SD among model predictions provide valuable information about the models' trustworthiness and should be offered next to their predictions. These two approaches could guide future sampling within sub-areas with high SD and low model applicability. Despite the importance of spatial CV and extrapolation assessment when using ML algorithms, the literature research yielded only few studies that have offered such information aside from ML-based seafloor spatial predictions (Misiuk et al., 2019; Diesing, 2020a; Dolan et al., 2021; Gazis & Greinert, 2021; Spiegel et al., 2024).

It is worth mentioning that a detailed comparison of performance metrics was not the primary goal of this study. Each spatial problem is unique, and each algorithm has several issues to address, such as model accuracy, extrapolation, interpretability, and computational speed. Here, no ML method could outperform the others in all aspects. The multi-model approach provides confidence in the derived nodule coverage distribution because all models predicted similar distributions and variable ranking. The true benefit of a multi-model approach is the increased confidence in areas with good spatial agreement (i.e., low SD) and understanding of the contribution of specific predictors. Complex and simple models perform similarly well in the presence of meaningful predictors and training data that capture the feature space and variability well (Rudin, 2019). As such, we see ML models as derivatives for predictions based on data that have captured the underlying relationships and interactions between the response variable and predictors, but for which our conceptual understanding is still developing (Breiman, 2001b; Shmueli, 2010). Thus, although computational algorithms are used for the prediction, the initial phase of such studies is and needs to be very data-centric, emphasizing acquisition of a large amount of high-quality data at the right locations. A priori knowledge of the right (and technically feasible) sampling locations considering each study area's unique characteristics is difficult, particularly in the absence of legacy data (a typical problem in deep-sea research). Recently, proposed AUV-based future sampling approaches (Shields et al., 2023) and domain knowledge could help achieve a balanced geographical and feature space sample coverage.

Conclusions

Deep-sea mining research is characterized by data scarcity, knowledge gaps, and lack of high-resolution studies (Amon et al., 2022). This work provided clear insights into the spatial distribution of polymetallic nodules within a potential mining field in the Clarion–Clipperton fracture zone, providing the basis for resource estimation, mining path planning, and benthic habitat mapping therein. The multi-model and interpretable analysis combined with angular range analysis and orthophoto-mosaicking can shed light on machine learning models’ ‘black box’ character, giving us valuable feedback regarding each predictor variable's statistical and natural contribution and advancing our knowledge of deep-sea polymetallic nodule occurrences.

Data Availability

All data associated with the results are presented in the paper and as supplementary material.

References

Alevizos, E., & Greinert, J. (2018). The hyper-angular cube concept for improving the spatial and acoustic resolution of MBES backscatter angular response analysis. Geosciences, 8(12), 446.
Article Google Scholar
Alevizos, E., Huvenne, V. A. I., Schoening, T., Simon-Lledó, E., Robert, K., & Jones, D. O. B. (2022). Linkages between sediment thickness, geomorphology and MN nodule occurrence: New evidence from AUV Geophysical mapping in the clarion-clipperton zone. Deep Sea Research Part I Oceanographic Research Papers, 179, 103645.
Article CAS Google Scholar
Amon, D. J., Gollner, S., Morato, T., Smith, C. R., Chen, C., Christiansen, S., Currie, B., Drazen, J. C., Fukushima, T., Gianni, M., Gjerde, K. M., Gooday, A. J., Grillo, G. G., Haeckel, M., Joyini, T., Ju, S.-J., Levin, L. A., Metaxas, A., Mianowicz, K., & Molodtsova, T. N. (2022). Assessment of scientific gaps related to the effective environmental management of deep-seabed mining. Marine Policy, 138, 105006.
Article Google Scholar
Amon, D. J., Ziegler, A. F., Dahlgren, T. G., Glover, A. G., Goineau, A., Gooday, A. J., Wiklund, H., & Smith, C. R. (2016). Insights into the abundance and diversity of abyssal megafauna in a polymetallic-nodule region in the Eastern Clarion-Clipperton Zone. Scientific Reports, 6(1), 30492.
Article CAS Google Scholar
Anselin, L. (1995). Local indicators of spatial association—LISA. Geographical Analysis, 27(2), 93–115.
Article Google Scholar
Beck, M. W. (2018). NeuralNetTools: Visualization and analysis tools for neural networks. Journal of Statistical Software, 85(11), 1–20.
Article Google Scholar
Behrens, T., & Viscarra Rossel, R. A. (2020). On the interpretability of predictors in spatial data science: The information horizon. Scientific Reports, 10(1), 16737.
Article CAS Google Scholar
Bekins, B. A., Spivack, A. J., Davis, E. E., & Mayer, L. A. (2007). Dissolution of biogenic ooze over basement edifices in the equatorial Pacific with implications for hydrothermal ventilation of the oceanic crust. Geology, 35(8), 679.
Article Google Scholar
Berger, W. H., Adelseck, C. G., Jr., & Mayer, L. A. (1976). Distribution of carbonate in surface sediments of the Pacific Ocean. Journal of Geophysical Research, 81(15), 2617–2627.
Article CAS Google Scholar
Breiman, L. (2001a). Random forests. Machine Learning, 45(1), 5–32.
Article Google Scholar
Breiman, L. (2001b). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science A Review Journal of the Institute of Mathematical Statistics, 16(3), 199–231.
Google Scholar
Breyer, G., Bartholomä, A., & Pesch, R. (2023). The suitability of machine-learning algorithms for the automatic acoustic seafloor classification of hard substrate habitats in the German bight. Remote Sensing, 15(16), 4113.
Article Google Scholar
Bühlmann, P., & Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statistical Science A Review Journal of the Institute of Mathematical Statistics, 22(4), 477–505.
Google Scholar
Calder, B. R., & Mayer, L. A. (2003). Automatic processing of high-rate, high-density multibeam echosounder data. Geochemistry Geophysics Geosystems. https://doi.org/10.1029/2002gc000486
Article Google Scholar
Chunhui, T., Xiaobing, J., Aifei, B., Hongxing, L., Xianming, D., Jianping, Z., Chunhua, G., Tao, W., & Wilkens, R. (2015). Estimation of manganese nodule coverage using multi-beam amplitude data. Marine Georesources & Geotechnology, 33(4), 283–288.
Article Google Scholar
Cochonat, P., Le Suavé, R., Charles, C., Greger, B., Hoffert, M., Lenoble, J. P., Meunier, J., & Pautot, G. (1992). First in situ studies of nodule distribution and geotechnical measurements of associated deep-sea clay (Northeastern Pacific Ocean). Marine Geology, 103(1–3), 373–380.
Article Google Scholar
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.
Article Google Scholar
Craig, J. D. (1979). The relationship between bathymetry and ferromanganese deposits in the north equatorial Pacific. Marine Geology, 29(1–4), 165–186.
Article CAS Google Scholar
De Smet, B., Simon-Lledó, E., Mevenkamp, L., Pape, E., Pasotti, F., Jones, D. O. B., & Vanreusel, A. (2021). The megafauna community from an abyssal area of interest for mining of polymetallic nodules. Deep Sea Research Part I Oceanographic Research Papers, 172, 103530.
Article Google Scholar
Diesing, M. (2020b). Deep-sea sediments of the global ocean. Earth System Science Data, 12(4), 3367–3381.
Article Google Scholar
Diesing, M., Kröger, S., Parker, R., Jenkins, C., Mason, C., & Weston, K. (2017). Predicting the standing stock of organic carbon in surface sediments of the North-West European continental shelf. Biogeochemistry, 135(1–2), 183–200.
Article CAS Google Scholar
Diesing, M., & Stephens, D. (2015). A multi-model ensemble approach to seabed mapping. Journal of Sea Research, 100, 62–69.
Article Google Scholar
Diesing, M., Thorsnes, T., & Bjarnadóttir, L. R. (2020a). Organic carbon in surface sediments of the North Sea and Skagerrak. Biogeosciences Discussions, 2020, 1–30.
Google Scholar
Dolan, M. F. J., Ross, R. E., Albretsen, J., Skarðhamar, J., Gonzalez-Mirelis, G., Bellec, V. K., Buhl-Mortensen, P., & Bjarnadóttir, L. R. (2021). Using spatial validity and uncertainty metrics to determine the relative suitability of alternative suites of oceanographic data for seabed biotope prediction. A case study from the Barents sea, Norway. Geosciences, 11(2), 48.
Article Google Scholar
Dreisetl, I. (2016). Deep sea exploration for metal reserves—objectives, methods and look into the future. In T. Abramowski (Ed.), Deep sea mining value chain: Organization, technology and development (pp. 105–118). IOM.
Google Scholar
Dutkiewicz, A., Judge, A., & Müller, R. D. (2020). Environmental predictors of deep-sea polymetallic nodule occurrence in the global ocean. Geology, 48(3), 293–297.
Article CAS Google Scholar
Eilers, P. H. C., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science A Review Journal of the Institute of Mathematical Statistics, 11(2), 89–121.
Google Scholar
Ellefmo, S. L., & Kuhn, T. (2021). Application of soft data in nodule resource estimation. Natural Resources Research, 30(2), 1069–1091.
Article Google Scholar
Fernández-Delgado, M., Sirsat, M. S., Cernadas, E., Alawadi, S., Barro, S., & Febrero-Bande, M. (2019). An extensive experimental survey of regression methods. Neural Networks, 111, 11–34.
Article Google Scholar
Florinsky, I. V. (2017). An illustrated introduction to general geomorphometry. Progress in Physical Geography, 41(6), 723–752.
Article Google Scholar
FMI/Flanders Marine Institute. (2019). Maritime Boundaries Geodatabase, version 11, Retrieved April 24, 2023, from https://www.marineregions.org/
Fonseca, L., & Calder, B. (2005). Geocoder: An efficient Backscatter map constructor. U.S. Hydrographic Conference Retrieved April 24, 2023, from https://scholars.unh.edu/ccom/339/
Fonseca, L., Brown, C., Calder, B., Mayer, L., & Rzhanov, Y. (2009). Angular range analysis of acoustic themes from Stanton Banks Ireland: A link between visual interpretation and multibeam echosounder angular signatures. Applied Acoustics, 70(10), 1298–1304.
Article Google Scholar
Fonseca, L., & Mayer, L. (2007). Remote estimation of surficial seafloor properties through the application Angular Range Analysis to multibeam sonar data. Marine Geophysical Research, 28(2), 119–126.
Article Google Scholar
Fourcade, Y., Besnard, A. G., & Secondi, J. (2018). Paintings predict the distribution of species, or the challenge of selecting environmental predictors and evaluation statistics. Global Ecology and Biogeography A Journal of Macroecology, 27(2), 245–256.
Article Google Scholar
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
Article Google Scholar
Gaida, T. C., Mohammadloo, T. H., Snellen, M., & Simons, D. G. (2019). Mapping the seabed and shallow subsurface with multi-frequency multibeam echosounders. Remote Sensing, 12(1), 52.
Article Google Scholar
Gaida, T., Tengku Ali, T., Snellen, M., Amiri-Simkooei, A., van Dijk, T., & Simons, D. (2018). A multispectral Bayesian classification method for increased acoustic discrimination of seabed sediments using multi-frequency multibeam backscatter data. Geosciences, 8(12), 455.
Article Google Scholar
Garson, D. G. (1991). Interpreting neural-network connection weights. Artificial Intelligence Expert, 6(4), 46–51.
Google Scholar
Gausepohl, F., Hennke, A., Schoening, T., Köser, K., & Greinert, J. (2020). Scars in the abyss: Reconstructing sequence, location and temporal change of the 78 plough tracks of the 1989 DISCOL deep-sea disturbance experiment in the Peru Basin. Biogeosciences, 17(6), 1463–1493.
Article Google Scholar
Gazis, I.-Z. (2020). Processed EM122 multibeam swath bathymetry collected during SONNE cruise SO268/1 inside the Belgian License Area in Clarion Clipperton Zone, Pacific. GEOMAR—Helmholtz Centre for Ocean Research Kiel, PANGAEA, (2020). Retrieved April 24, 2023, from https://doi.org/10.1594/PANGAEA.915767
Gazis, I.-Z., & Greinert, J. (2021). Importance of spatial autocorrelation in machine learning modeling of polymetallic nodules, model uncertainty and transferability at local scale. Minerals, 11(11), 1172.
Article CAS Google Scholar
Gazis, I.-Z., Schoening, T., Alevizos, E., & Greinert, J. (2018). Quantitative mapping and predictive modeling of MN Nodules’ distribution from hydroacoustic and optical AUV data linked by random forests machine learning. Biogeosciences, 15(23), 7347–7377.
Article Google Scholar
Glasby, G. P. (1973). Distribution of manganese nodules and lebensspuren in underwater photographs from the Carlsberg Ridge, Indian Ocean. New Zealand Journal of Geology and Geophysics, 16(1), 1–17.
Article Google Scholar
Global Sea Mineral Resources NV (GSR). (2018). Environmental Impact Statement. Small-Scale Testing of Nodule Collector Component on the Seafloor of the Clarion-Clipperton Fracture Zone and its Environmental Impact, p. 337. Retrieved April 24, 2023, from https://www.isa.org.jm/minerals/environmental-impact-assessments
Goh, A. T. C. (1995). Back-propagation neural networks for modeling complex systems. Artificial Intelligence in Engineering, 9(3), 143–151.
Article Google Scholar
Greenwell, B. (2017). pdp: An R package for constructing partial dependence plots. The R Journal, 9(1), 421.
Article Google Scholar
Günther, F., & Fritsch, S. (2010). Neuralnet: Training of neural networks. The R Journal, 2(1), 30.
Article Google Scholar
Haeckel, M., & Linke, P. (2021). RV SONNE Cruise Report SO268—Assessing the Impacts of Nodule Mining on the Deep-Sea Environment: Nodule Monitoring, Manzanillo (Mexico)—Vancouver (Canada), 17.02. – 27.05.2019 GEOMAR Report, N. Ser. 059. GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel, Kiel, Germany, 359 + Appendix (in all 802) pp. (2021). Retrieved April 24, 2023, from https://oceanrep.geomar.de/id/eprint/54402/
Halbach, P., Scherhag, C., Hebisch, U., & Marchig, V. (1981). Geochemical and mineralogical control of different genetic types of deep-sea nodules from the Pacific Ocean. Mineralium Deposita, 16(1), 59–84.
Article CAS Google Scholar
Hao, T., Elith, J., Lahoz-Monfort, J. J., & Guillera-Arroita, G. (2020). Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models. Ecography, 43(4), 549–558.
Article Google Scholar
Hari, V. N., Kalyan, B., Chitre, M., & Ganesan, V. (2018). Spatial modeling of deep-sea ferromanganese nodules with limited data using neural networks. IEEE Journal of Oceanic Engineering, 43(4), 997–1014.
Article Google Scholar
Hasan, C., Ierodiaconou, R., Laurenson, D., & Schimel, L. (2014). Integrating Multibeam Backscatter angular response, mosaic and bathymetry data for benthic habitat mapping. PLoS ONE, 9, e97339.
Article Google Scholar
Hasan, R., Ierodiaconou, D., & Monk, J. (2012). Evaluation of four supervised learning methods for benthic habitat mapping using backscatter from multi-beam sonar. Remote Sensing, 4(11), 3427–3443.
Article Google Scholar
Hastie, T., & Tibshirani, R. (1986). Generalized additive models. Statistical Science A Review Journal of the Institute of Mathematical Statistics, 1(3), 297–310.
Google Scholar
Hayes, S. P. (1979). Benthic current observations at DOMES sites A, B, and C in the tropical north pacific ocean. In J. L. Bischoff & D. Z. Piper (Eds.), Marine geology and oceanography of the pacific manganese nodule province. Marine science. (Vol. 9). Springer.
Google Scholar
Hein, J. R., Koschinsky, A., & Kuhn, T. (2020). Deep-ocean polymetallic nodules as a resource for critical materials. Nature Reviews Earth & Environment, 1(3), 158–169.
Article CAS Google Scholar
Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B. M., & Gräler, B. (2018). Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ, 6(e5518), e5518.
Article Google Scholar
Herkül, K., Peterson, A., & Paekivi, S. (2017). Applying multibeam sonar and mathematical modeling for mapping seabed substrate and biota of offshore shallows. Estuarine Coastal and Shelf Science, 192, 57–71.
Article Google Scholar
Hijmans, R. (2022). raster: Geographic data analysis and modeling, R package version 3.5-21. Retrieved June 24, 2023, from https://CRAN.R-project.org/package=raster
Hillman, J. I. (2017). Validation of automated supervised segmentation of Multibeam Backscatter data from the Chatham Rise, New Zealand. Marine Geophysical Research, 39, 205–227.
Article Google Scholar
Hothorn, T., Buehlmann, P., Kneib, T., Schmid, M., & Hofner, B. (2021). mboost: Model-based boosting, R package version 2.9-5. Retrieved April 24, 2023, from https://CRAN.R-project.org/package=mboost
Huff, L. C. (2008). Acoustic remote sensing as a tool for habitat mapping in Alaska waters. Marine habitat mapping technology for alaska (pp. 29–46). Alaska Sea Grant, University of Alaska Fairbanks. https://doi.org/10.4027/mhmta.2008.03
Chapter Google Scholar
International Seabed Authority (ISA). (2023). PMN Exploration Areas, PMN Reserved Areas, APEI Areas. Retrieved April 24, 2023, from https://www.isa.org.jm/exploration-contracts/maps/
Jackson, D. R., & Briggs, K. B. (1992). High-frequency bottom backscattering: Roughness versus sediment volume scattering. The Journal of the Acoustical Society of America, 92(2), 962–977.
Article Google Scholar
Jackson, D. R., Winebrenner, D. P., & Ishimaru, A. (1986). Application of the composite roughness model to high-frequency bottom backscattering. The Journal of the Acoustical Society of America, 79(5), 1410–1422.
Article Google Scholar
Janowski, L., Wroblewski, R., Dworniczak, J., Kolakowski, M., Rogowska, K., Wojcik, M., & Gajewski, J. (2021). Offshore benthic habitat mapping based on object-based image analysis and geomorphometric approach. A case study from the Slupsk Bank, Southern Baltic Sea. The Science of the Total Environment, 801, 149712.
Article CAS Google Scholar
Josso, P., Hall, A., Williams, C., Le Bas, T., Lusty, P., & Murton, B. (2023). Application of random-forest machine learning algorithm for mineral predictive mapping of Fe-Mn crusts in the World Ocean. Ore Geology Reviews, 162(105671), 105671.
Article Google Scholar
Juan, C., Van Rooij, D., & De Bruycker, W. (2018). An assessment of bottom current controlled sedimentation in Pacific Ocean abyssal environments. Marine Geology, 403, 20–33.
Article Google Scholar
Kaikkonen, L., Virtanen, E. A., Kostamo, K., Lappalainen, J., & Kotilainen, A. T. (2019). Extensive coverage of marine mineral concretions revealed in shallow shelf sea areas. Frontiers in Marine Science, 6, 541.
Article Google Scholar
Karasiak, N., Dejoux, J.-F., Monteil, C., & Sheeren, D. (2022). Spatial dependence between training and test sets: Another pitfall of classification accuracy assessment in remote sensing. Machine Learning, 111(7), 2715–2740.
Article Google Scholar
Karatzoglou, A., Smola, A., & Hornik, K. (2023). kernlab: Kernel-Based Machine Learning Lab. R package version 0.9-32, 2023. Retrieved April 24, 2023, from https://CRAN.R-project.org/package=kernlab
Karatzoglou, A., Smola, A., Hornik, K., & Zeileis, A. (2004). kernlab—An S4 package for Kernel Methods in R. Journal of Statistical Software, 11(9), 1–20.
Article Google Scholar
Kodagali, V. (1988). Influence of regional and local topography on the distribution of polymetallic nodules in Central Indian Ocean Basin. Geo-Marine Letters, 8(3), 173–178. https://doi.org/10.1007/bf02326094
Article Google Scholar
Köser, K., Song, Y., Petersen, L., Wenzlaff, E. & Woelk, F. (2021). Robustly removing deep sea lighting effects for visual mapping of Abyssal Plains. arXiv.org (2021). Retrieved June 24, 2023, from https://doi.org/10.48550/arXiv.2110.00480
Kuhn, M. (2022). caret: Classification and regression training. R package version 6.0-92. Retrieved April 24, 2023, from https://CRAN.R-project.org/package=caret
Kuhn, T., & Rühlemann, C. (2021). Exploration of polymetallic nodules and resource assessment: A case study from the German contract area in the Clarion-Clipperton Zone of the tropical Northeast Pacific. Minerals, 11(6), 618.
Article CAS Google Scholar
Kuhn, T., Wegorzewski, A., Rühlemann, C., & Vink, A. (2017). Composition, Formation, and Occurrence of Polymetallic Nodules. In R. Sharma (Ed.), Deep-sea mining. Springer. https://doi.org/10.1007/978-3-319-52557-0_2
Chapter Google Scholar
Lauria, V., Power, A. M., Lordan, C., Weetman, A., & Johnson, M. P. (2015). Spatial transferability of habitat suitability models of Nephrops norvegicus among fished areas in the northeast Atlantic: Sufficiently stable for marine resource conservation? PLoS ONE, 10(2), e0117006.
Article Google Scholar
Lawson, E., Smith, D., Sofge, D., Elmore, P., & Petry, F. (2017). Decision forests for machine learning classification of large, noisy seafloor feature sets. Computers & Geosciences, 99, 116–124.
Article Google Scholar
Lecours, V., Devillers, R., Edinger, E. N., Brown, C. J., & Lucieer, V. L. (2017). Influence of artefacts in marine digital terrain models on habitat maps and species distribution models: A multiscale assessment. Remote Sensing in Ecology and Conservation, 3(4), 232–246.
Article Google Scholar
Lee, S. H., & Kim, K.-H. (2004). Side-scan sonar characteristics and manganese nodule abundance in the Clarion—Clipperton Fracture Zones, NE equatorial Pacific. Marine Georesources & Geotechnology, 22(1–2), 103–114.
Article CAS Google Scholar
Li, J., Alvarez, B., Siwabessy, J., Tran, M., Huang, Z., Przeslawski, R., Radke, L., Howard, F., & Nichol, S. (2017). Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: Predicting sponge species richness. Environmental Modelling & Software, 97, 112–129.
Article CAS Google Scholar
Li, Z., Li, H., Hein, J. R., Dong, Y., Wang, M., Ren, X., Wu, Z., Li, X., & Chu, F. (2021). A possible link between seamount sector collapse and manganese nodule occurrence in the Abyssal Plains. NW Pacific Ocean. Ore Geology Reviews, 138, 104378.
Article Google Scholar
Lippmann, R. (1987). An introduction to computing with neural nets. IEEE ASSP Magazine, 4(2), 4–22.
Article Google Scholar
Lipton, I., Nimmo, M., & Parianos, J. (2016). NI 43-101 Technical report. TOML Clarion Clipperton Zone Project, Pacific Ocean. Retrieved April 24, 2023, from https://doi.org/10.13140/RG.2.2.23742.08000
Lucieer, V., Huang, Z., & Siwabessy, J. (2016). Analyzing uncertainty in multibeam bathymetric data and the impact on derived seafloor attributes. Marine Geodesy, 39(1), 32–52.
Article Google Scholar
Lurton, X. & Lamarche, G. (Eds) (2015). Backscatter measurements by seafloor-mapping sonars. Guidelines and Recommendations. 200p. Retrieved January 24 1, 2024, from https://doi.org/10.5281/zenodo.10089261
Macdonald, K. C., Fox, P. J., Alexander, R. T., Pockalny, R., & Gente, P. (1996). Volcanic growth faults and the origin of Pacific abyssal hills. Nature, 380(6570), 125–129.
Article CAS Google Scholar
Machida, S., Sato, T., Yasukawa, K., Nakamura, K., Iijima, K., Nozaki, T., & Kato, Y. (2019). Visualisation method for the broad distribution of seafloor Ferromanganese deposits. Marine Georesources & Geotechnology, 39(3), 267–279.
Article Google Scholar
Marchig, V., von Stackelberg, U., Hufnagel, H., & Durn, G. (2001). Compositional changes of surface sediments and variability of manganese nodules in the Peru Basin. Deep Sea Research Part II Topical Studies in Oceanography, 48(17–18), 3523–3547.
Article CAS Google Scholar
Mayer, M. (2022). splitTools: Tools for data splitting. R package version 0.3.2). Retrieved June 24, 2023, from https://CRAN.R-project.org/package=splitTools
Mayer, L. (1981). Erosional troughs in deep-sea carbonates and their relationship to basement structure. Marine Geology, 39(1–2), 59–80.
Article Google Scholar
Mbani, B., Buck, V., & Greinert, J. (2023). An automated image-based workflow for detecting megabenthic fauna in optical images with examples from the Clarion-Clipperton Zone. Scientific Reports, 13(1), 8350.
Article CAS Google Scholar
Mbani, B., Schoening, T., Gazis, I.-Z., Koch, R., & Greinert, J. (2022). Implementation of an automated workflow for image-based seafloor classification with examples from manganese-nodule covered seabed areas in the Central Pacific Ocean. Scientific Reports, 12(1), 15338.
Article CAS Google Scholar
McCullagh, P. J., & Nelder, J. A. (1989). Generalized linear models. Chapman & Hall.
Book Google Scholar
Menandro, P. S., Bastos, A. C., Misiuk, B., & Brown, C. J. (2022). Applying a multi-method framework to analyze the multispectral acoustic response of the seafloor. Frontiers in Remote Sensing, 3, 860282.
Article Google Scholar
Merow, C., Smith, M. J., Edwards, T. C., Guisan, A., McMahon, S. M., Normand, S., Thuiller, W., Wüest, R. O., Zimmermann, N. E., & Elith, J. (2014). What do we gain from simplicity versus complexity in species distribution models? Ecography, 37(12), 1267–1281.
Article Google Scholar
Mewes, K., Mogollón, J. M., Picard, A., Rühlemann, C., Kuhn, T., Nöthen, K., & Kasten, S. (2014). Impact of depositional and biogeochemical processes on small scale variations in nodule abundance in the Clarion-Clipperton Fracture Zone. Deep-Sea Research Part I Oceanographic Research Papers, 91, 125–141.
Article CAS Google Scholar
Meyer, H., Milà, C., Ludwig, M., & Linnenbrink, J. (2024). CAST: 'caret' Applications for Spatial-Temporal Models. R package version 0.9.0. Retrieved January 24, 2024, from https://hannameyer.github.io/CAST/
Meyer, H., & Pebesma, E. (2021). Predicting into unknown space? Estimating the area of applicability of spatial prediction models. Methods in Ecology and Evolution, 12(9), 1620–1633.
Article Google Scholar
Meyer, H., & Pebesma, E. (2022). Machine learning-based global maps of ecological variables and the challenge of assessing them. Nature Communications, 13(1), 2208. https://doi.org/10.1038/s41467-022-29838-9
Article CAS Google Scholar
Meyer, H., Reudenbach, C., Hengl, T., Katurji, M., & Nauss, T. (2018). Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation. Environmental Modelling & Software with Environment Data News, 101, 1–9.
Article Google Scholar
Milà, C., Mateu, J., Pebesma, E., & Meyer, H. (2022). Nearest neighbour distance matching Leave-One-Out Cross-Validation for map validation. Methods in Ecology and Evolution, 13(6), 1304–1316.
Article Google Scholar
Misiuk, B., & Brown, C. J. (2022). Multiple imputation of multibeam angular response data for high resolution full coverage seabed mapping. Marine Geophysical Research, 43(1), 7.
Article Google Scholar
Misiuk, B., & Brown, C. J. (2023). Improved environmental mapping and validation using bagging models with spatially clustered data. Ecological Informatics, 77, 102181.
Article Google Scholar
Misiuk, B., Diesing, M., Aitken, A., Brown, C. J., Edinger, E. N., & Bell, T. (2019). A spatially explicit comparison of quantitative and categorical modelling approaches for mapping seabed sediments using Random Forest. Geosciences, 9(6), 254.
Article Google Scholar
Mitchell, A., & Griffin, L. S. (2021). Spatial measurements and statistics. Esri Press.
Molnar, C. (2018). iml: An R package for interpretable machine learning. Journal of Open Source Software, 3(26), 786.
Article Google Scholar
Moore, T. C., Jr., Mitchell, N. C., Lyle, M., Backman, J., & Pälike, H. (2007). Hydrothermal pits in the biogenic sediments of the equatorial Pacific Ocean. Geochemistry Geophysics Geosystems G(3). https://doi.org/10.1029/2006gc001501
Article Google Scholar
Pape, E., Bezerra, T. N., Gheerardyn, H., Buydens, M., Kieswetter, A., & Vanreusel, A. (2021). Potential impacts of polymetallic nodule removal on deep-sea meiofauna. Scientific Reports, 11(1), 19996.
Article CAS Google Scholar
Parianos, J., Lipton, I., & Nimmo, M. (2021). Aspects of estimation and reporting of mineral resources of seabed polymetallic nodules: A contemporaneous case study. Minerals, 11(2), 200.
Article CAS Google Scholar
Parianos, J., O’Sullivan, A., & Madureira, P. (2022). Geology of parts of the central and eastern Clarion Clipperton Zone. Journal of Maps, 18(2), 232–245.
Article Google Scholar
Paul, S. A. L., Haeckel, M., Bau, M., Bajracharya, R., & Koschinsky, A. (2019). Small-scale heterogeneity of trace metals including rare earth elements and yttrium in deep-sea sediments and porewaters of the Peru Basin, southeastern equatorial Pacific. Biogeosciences, 16(24), 4829–4849.
Article CAS Google Scholar
Peukert, A., Schoening, T., Alevizos, E., Köser, K., Kwasnitschka, T., & Greinert, J. (2018). Understanding MN-nodule distribution and evaluation of related deep-sea mining impacts using AUV-based hydroacoustic and Optical Data. Biogeosciences, 15(8), 2525–2549.
Article Google Scholar
Ploton, P., Mortier, F., Réjou-Méchain, M., Barbier, N., Picard, N., Rossi, V., Dormann, C., Cornu, G., Viennois, G., Bayol, N., Lyapustin, A., Gourlet-Fleury, S., & Pélissier, R. (2020). Spatial validation reveals poor predictive performance of large-scale ecological mapping models. Nature Communications, 11(1), 4540.
Article CAS Google Scholar
Porskamp, P., Schimel, A. C. G., Young, M., Rattray, A., Ladroit, Y., & Ierodiaconou, D. (2022). Integrating multibeam echosounder water-column data into benthic habitat mapping. Limnology and Oceanography, 67(8), 1701–1713.
Article Google Scholar
Probst, P., Bischl, B., & Boulesteix, A.-L. (2019). Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research, 20, 1–32.
Google Scholar
R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/
Riedmiller, M., & Braun, H. (2003). A direct adaptive method for faster backpropagation learning: the RPROP algorithm.In IEEE International Conference on Neural Networks.
Riedmiller, M. (1994). Advanced supervised learning in multi-layer perceptrons—From backpropagation to adaptive learning algorithms. Computer Standards & Interfaces, 16(3), 265–278.
Article Google Scholar
Robert, K., Jones, D. O. B., Roberts, J. M., & Huvenne, V. A. I. (2016). Improving predictive mapping of deep-water habitats: Considering multiple model outputs and ensemble techniques. Deep Sea Research Part I Oceanographic Research Papers, 113, 80–89.
Article Google Scholar
Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera-Arroita, G., Hauenstein, S., Lahoz-Monfort, J. J., Schröder, B., Thuiller, W., Warton, D. I., Wintle, B. A., Hartig, F., & Dormann, C. F. (2017). Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40(8), 913–929.
Article Google Scholar
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215.
Article Google Scholar
Sappington, J. M., Longshore, K. M., & Thompson, D. B. (2007). Quantifying landscape ruggedness for animal habitat analysis: A case study using bighorn sheep in the Mojave Desert. The Journal of Wildlife Management, 71(5), 1419–1426.
Article Google Scholar
Scanlon, K. M., & Masson, D. G. (1992). Fe−Mn nodule field indicated by GLORIA, north of the Puerto Rico Trench. Geo-Marine Letters, 12, 208–213. https://doi.org/10.1007/BF02091840
Article CAS Google Scholar
Schloerke, B., Cook, D., Larmarange, J., Briatte, F., Marbach, M., Thoen, E., Elberg, A., & Crowley, J. (2023). GGally: Extension to 'ggplot2'. R package version 2.2.0. Retrieved April 24, 2023, from https://github.com/ggobi/ggally
Schoening, T., Gazis, I-Z. (2019). Summary of sizes, weights, counts and coverage of poly-metallic nodules from box cores taken during SONNE cruises SO268/1 and SO268/2. GEOMAR - Helmholtz Centre for Ocean Research Kiel, PANGAEA. Retrieved April 24, 2023, from https://doi.org/10.1594/PANGAEA.904967
Schoening, T., Jones, D. O., & Greinert, J. (2017). Compact-morphology-based poly-metallic nodule delineation. Scientific Reports, 7(1), 13338.
Article Google Scholar
Schoening, T., Kuhn, T., Jones, D. O. B., Simon-Lledo, E., & Nattkemper, T. W. (2016). Fully automated image segmentation for benthic resource assessment of poly-metallic nodules. Methods in Oceanography, 15–16, 78–89.
Article Google Scholar
Schratz, P., Muenchow, J., Iturritxa, E., Richter, J., & Brenning, A. (2019). Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using Spatial Data. Ecological Modelling, 406, 109–120. https://doi.org/10.1016/j.ecolmodel.2019.06.002
Article Google Scholar
Sharma, R. (1993). Quantitative estimation of seafloor features from photographs and their application to nodule mining. Marine Georesources & Geotechnology, 11(4), 311–331.
Article Google Scholar
Sharma, R. (2017). Assessment of distribution characteristics of polymetallic nodules and their implications on deep-sea mining. In R. Sharma (Ed.), Deep-Sea mining. Springer. https://doi.org/10.1007/978-3-319-52557-0_8
Chapter Google Scholar
Sharma, R., Khadge, N. H., & Jai Sankar, S. (2013). Assessing the distribution and abundance of seabed minerals from seafloor photographic data in the Central Indian Ocean Basin. International Journal of Remote Sensing, 34(5), 1691–1706.
Article Google Scholar
She, M., Song, Y., Nakath, D., Köser, K., (2023). Efficient Large-scale AUV-based Visual Seafloor Mapping. arXiv preprint arXiv:2308.0614, https://doi.org/10.48550/arXiv.2308.06147
Shields, J., Pizarro, O., & Williams, S. (2023). Feature space exploration for planning initial benthic AUV surveys. Field Robotics, 3(1), 652–686.
Article Google Scholar
Shmueli, G. (2010). To explain or to predict? Statistical Science A Review Journal of the Institute of Mathematical Statistics, 25(3), 289–310.
Google Scholar
Simon-Lledó, E., Amon, D. J., Bribiesca-Contreras, G., Cuvelier, D., Durden, J. M., Ramalho, S. P., Uhlenkott, K., Arbizu, P. M., Benoist, N., Copley, J., Dahlgren, T. G., Glover, A. G., Fleming, B., Horton, T., Ju, S.-J., Mejía-Saenz, A., McQuaid, K., Pape, E., Park, C., & Jones, D. O. B. (2023). Carbonate compensation depth drives abyssal biogeography in the northeast Pacific. Nature Ecology & Evolution, 7(9), 1388–1397.
Article Google Scholar
Simon-Lledó, E., Bett, B. J., Huvenne, V. A. I., Köser, K., Schoening, T., Greinert, J., & Jones, D. O. B. (2019a). Biological effects 26 years after simulated deep-sea mining. Scientific Reports, 9(1), 8040.
Article Google Scholar
Simon-Lledó, E., Bett, B. J., Huvenne, V. A. I., Schoening, T., Benoist, N. M. A., Jeffreys, R. M., Durden, J. M., & Jones, D. O. B. (2019b). Megafaunal variation in the abyssal landscape of the Clarion Clipperton Zone. Progress in Oceanography, 170, 119–133.
Article Google Scholar
Simon-Lledó, E., Bett, B. J., Huvenne, V. A., Schoening, T., Benoist, N. M., & Jones, D. O. (2019c). Ecology of a polymetallic nodule occurrence gradient: Implications for deep-sea mining. Limnology and Oceanography, 64(5), 1883–1894.
Article Google Scholar
Simon-Lledó, E., Pomee, C., Ahokava, A., Drazen, J. C., Leitner, A. B., Flynn, A., Parianos, J., & Jones, D. O. B. (2020). Multi-scale variations in invertebrate and fish megafauna in the mid-eastern Clarion Clipperton Zone. Progress in Oceanography, 187, 102405.
Article Google Scholar
Skornyakova, N. S., & Murdmaa, I. O. (1992). Local variations in distribution and composition of ferromanganese nodules in the Clarion-Clipperton Nodule Province. Marine Geology, 103(1–3), 381–405.
Article CAS Google Scholar
Song, Y., Nakath, D., She, M., & Köser, K. (2022). Optical imaging and image restoration techniques for deep ocean mapping: A comprehensive survey. PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science, 90(3), 243–267. https://doi.org/10.1007/s41064-022-00206-y
Article Google Scholar
Song, Y., She, M., & Köser, K. (2024). Advanced underwater image restoration in complex illumination conditions. ISPRS Journal of Photogrammetry and Remote Sensing, 209, 197–212. https://doi.org/10.1016/j.isprsjprs.2024.02.004
Article Google Scholar
Spiegel, T., Diesing, M., Dale, A. W., Lenz, N., Schmidt, M., Sommer, S., Böttner, C., Fuhr, M., Kalapurakkal, H. T., Schulze, C.-S., & Wallmann, K. (2024). Modelling mass accumulation rates and 210PB rain rates in the Skagerrak: Lateral sediment transport dominates the sediment input. Frontiers in Marine Science, 11, 1331102.
Article Google Scholar
von Stackelberg, U. (2000). Manganese nodules of the Peru Basin (Chapter 8). In Cronan, D.S. (Ed.), Handbook of Marine Mineral Deposits. CRC Marine Science Series, 17: pp. 197-238, CRC Press, Boca Raton, Florida, USA. ISBN 10: 084938429XISBN
Stephens, D., & Diesing, M. (2014). A comparison of supervised classification methods for the prediction of substrate type using multibeam acoustic and legacy grain-size data. PLoS ONE, 9(4), e93950.
Article Google Scholar
Stienessen, S. C., Rooper, C. N., Webe, T. C., Jones, D. T., Pirtle, J. L., & Wilson, C. D. (2021). Comparison of model types for prediction of seafloor trawlability in the Gulf of Alaska by using multibeam sonar data. Fishery Bulletin, 119(2–3), 184–196.
Article Google Scholar
Trzcinska, K., Janowski, L., Nowak, J., Rucinska-Zjadacz, M., Kruss, A., von Deimling, J. S., Pocwiardowski, P., & Tegowski, J. (2020). Spectral features of dual-frequency multibeam echosounder data for benthic habitat mapping. Marine Geology, 427, 106239.
Article Google Scholar
Tsune, A. (2021). Quantitative expression of the burial phenomenon of deep seafloor manganese nodules. Minerals, 11(2), 227.
Article CAS Google Scholar
Uhlenkott, K., Meyn, K., Vink, A., & Martínez Arbizu, P. (2023). A review of megafauna diversity and abundance in an exploration area for polymetallic nodules in the eastern part of the Clarion Clipperton Fracture Zone (North East Pacific), and implications for potential future deep-sea mining in this area. Marine Biodiversity, 53(2), 22.
Article Google Scholar
Uhlenkott, K., Simon-Lledó, E., Vink, A., & Martínez Arbizu, P. (2022). Investigating the benthic megafauna in the eastern Clarion Clipperton Fracture Zone (north-east Pacific) based on distribution models predicted with random forest. Scientific Reports, 12(1), 8229.
Article CAS Google Scholar
Valavi, R., Elith, J., Lahoz-Monfort, J. J., & Guillera-Arroita, G. (2019). blockCV: An r package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models. Methods in Ecology and Evolution, 10(2), 225–232.
Article Google Scholar
Vanreusel, A., Hilario, A., Ribeiro, P. A., Menot, L., & Arbizu, P. M. (2016). Threatened by mining, polymetallic nodules are required to preserve abyssal epifauna. Scientific Reports, 6(1), 26808.
Article CAS Google Scholar
Vink, A. et al. (2022). MANGAN 2021 Cruise Report: Independent scientific monitoring of two collector tests in the BGR and GSR contract areas for the exploration of polymetallic nodules in the equatorial NE Pacific. BGR Report, 363 pp. Retrieved April 24, 2023, from https://doi.org/10.25928/hw7d-fs42
Volz, J. B., Mogollón, J. M., Geibert, W., Arbizu, P. M., Koschinsky, A., & Kasten, S. (2018). Natural spatial variability of depositional conditions, biogeochemical processes and element fluxes in sediments of the eastern Clarion-Clipperton Zone, Pacific Ocean. Deep-Sea Research Part I, Oceanographic Research Papers, 140, 159–172.
Article CAS Google Scholar
von Stackelberg, U. (1997). Growth history of manganese nodules and crusts of the Peru Basin. Geological Society, London Special Publications, 119(1), 153–176.
Article Google Scholar
Wadoux, A.M.J.-C., Heuvelink, G. B. M., de Bruin, S., & Brus, D. J. (2021). Spatial cross-validation is not the right way to evaluate map accuracy. Ecological Modelling, 457, 109692.
Article Google Scholar
Walbridge, S., Slocum, N., Pobuda, M., & Wright, D. (2018). Unified geomorphological analysis workflows with benthic Terrain Modeler. Geosciences, 8(3), 94.
Article Google Scholar
Wang, M., Wu, Z., Best, J., Yang, F., Li, X., Zhao, D., & Zhou, J. (2021). Using multibeam backscatter strength to analyze the distribution of manganese nodules: A case study of seamounts in the Western Pacific Ocean. Applied Acoustics, 173, 107729.
Article Google Scholar
Wasilewska-Błaszczyk, M., & Mucha, J. (2021). Application of general linear models (GLM) to assess nodule abundance based on a photographic survey (Case Study from IOM area), Pacific Ocean. Minerals, 11, 427.
Article Google Scholar
Weber, M. E., von Stackelberg, U., Marchig, V., Wiedicke, M., & Grupe, B. (2000). Variability of surface sediments in the Peru Basin: Dependence on water depth, productivity, bottom water flow, and seafloor topography. Marine Geology, 163(1–4), 169–184. https://doi.org/10.1016/s0025-3227(99)00103-6
Article CAS Google Scholar
Wenger, S. J., & Olden, J. D. (2012). Assessing transferability of ecological models: An underappreciated aspect of statistical validation. Methods in Ecology and Evolution, 3(2), 260–267.
Article Google Scholar
Weydert, M. (1991). Design of a system to assess manganese nodule resources acoustically. Ultrasonics, 29(2), 150–158.
Article Google Scholar
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis (2nd ed.). Springer International Publishing.
Book Google Scholar
Wiedicke, M. H., & Weber, M. E. (1996). Small-scale variability of seafloor features in the northern Peru Basin: Results from acoustic survey methods. Marine Geophysical Research, 18(5), 507–526.
Article Google Scholar
Wilson, M. F. J., O’Connell, B., Brown, C., Guinan, J. C., & Grehan, A. J. (2007). Multiscale terrain analysis of multibeam bathymetry data for habitat mapping on the continental slope. Marine Geodesy, 30(1–2), 3–35.
Article Google Scholar
Wong, L. J., Kalyan, B., Chitre, M., & Vishnu, H. (2021). Acoustic assessment of polymetallic nodule abundance using sidescan sonar and altimeter. IEEE Journal of Oceanic Engineering, 46(1), 132–142.
Article Google Scholar
Wright, M. N., & Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software, 77(1), 1–17.
Article Google Scholar
Yamazaki, T., & Sharma, R. (2000). Morphological features of co-rich manganese deposits and their relation to seabed slopes. Marine Georesources & Geotechnology, 18(1), 43–76.
Article Google Scholar
Yoo, C. M., Joo, J., Lee, S. H., Ko, Y., Chi, S.-B., Kim, H. J., Seo, I., & Hyeong, K. (2018). Resource assessment of polymetallic nodules using acoustic backscatter intensity data from the Korean exploration area, northeastern equatorial pacific. Ocean Science Journal OSJ, 53(2), 381–394.
Article Google Scholar
Yu, G., & Parianos, J. (2021). Empirical application of generalized Rayleigh distribution for mineral resource estimation of seabed polymetallic nodules. Minerals, 11(5), 449.
Article CAS Google Scholar
Zevenbergen, L. W., & Thorne, C. R. (1987). Quantitative analysis of land surface topography. Earth Surface Processes and Landforms, 12(1), 47–56.
Article Google Scholar

Download references

Acknowledgements

We gratefully acknowledge the captains and crews of RV SONNE and MV Island Pride for their essential assistance during the SO268 and MANGAN 2021 expeditions. We thank the Ocean Infinity AUV team for completing the extensive AUV dives and acquiring the presented data. We thank GSR (DEME group) for chartering the AUV used in this study. We thank saltation GmbH & Co. KG for fine-tuning the H²SOM software for this dataset. We are greatful to Karl Heger (Data Science Unit, GEOMAR) for his valuable help in image color normalization, Anastasios Poulos-Sidiropoulos for his assistance with the MBES and SSS data post-processing and Mario Veloso-Alarcon for his contribution to the drawing of the AUV in Supplementary Figure S9. Finally, we thank the GEOMAR Library team for its support in gathering the needed bibliography. This research was initiated by the European collaborative project MiningImpact 2 and received national funding through the German Ministry of Education and Research (grant no. 860 03F0812A). This is publication #65 of the Deep-Sea Monitoring group at GEOMAR.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany
Iason-Zois Gazis & Jens Greinert
DEME Group, GSR Global Sea Minerals Resources NV, Antwerp, Belgium
Francois Charlet
Christian-Albrechts University Kiel, Kiel, Germany
Jens Greinert

Authors

Iason-Zois Gazis
View author publications
You can also search for this author in PubMed Google Scholar
Francois Charlet
View author publications
You can also search for this author in PubMed Google Scholar
Jens Greinert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iason-Zois Gazis.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 16971 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gazis, IZ., Charlet, F. & Greinert, J. An Interpretable Multi-Model Machine Learning Approach for Spatial Mapping of Deep-Sea Polymetallic Nodule Occurrences. Nat Resour Res (2024). https://doi.org/10.1007/s11053-024-10393-7

Download citation

Received: 27 March 2024
Accepted: 19 July 2024
Published: 07 August 2024
DOI: https://doi.org/10.1007/s11053-024-10393-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An Interpretable Multi-Model Machine Learning Approach for Spatial Mapping of Deep-Sea Polymetallic Nodule Occurrences

Abstract

Similar content being viewed by others

Compact-Morphology-based poly-metallic Nodule Delineation

Automated real-time prediction of geological formation tops during drilling operations: an applied machine learning solution for the Norwegian Continental Shelf

Acoustic impedance prediction based on extended seismic attributes using multilayer perceptron, random forest, and extra tree regressor algorithms

Explore related subjects

Introduction

Study Area

Methodology

MBES Data Acquisition and Post-Processing

SSS Data Acquisition and Post-Processing

MBES and SSS Derivatives

Image Data Acquisition and Post-Processing

Orthophoto-Mosaics and Image-Derived Digital Elevation Models

Spatial Modeling

Data Pre-Processing

Data Transformation

ML Algorithms

CV Strategy and Extrapolation Assessment

Results

CV Block Size

Predictive Performance, Nodule Coverage and AOA

Variable Importance and ML Interpretation

Discussion

Conclusions

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Supplementary Information

Supplementary file1 (PDF 16971 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation