Introduction

Deep-sea polymetallic nodules (hereafter nodules) are mineral concretions of mm to cm in diameter, embedded in the sediment surface of the ocean floor, mainly between 4000 and 6000 m water depth. They are formed by precipitation of Fe– and Mn-oxyhydroxides from seawater and sediment porewater, creating repeated concentric encrustations around a core (e.g., nodule fragment, rock debris, and shark's tooth) over millions of years (Kuhn et al., 2017; Hein et al., 2020). Within the Clarion–Clipperton fracture zone (CCFZ) in the eastern equatorial Pacific Ocean, nodule abundances are typically 10–15 kg m−2 (wet weight). Nodules are rich in Mn (28.4 wt%) and Fe (6.16 wt%) while they also have high contents in Ni, Co, and Cu, which are usually > 2.5 wt% for these three metals combined (Kuhn et al., 2017; Hein et al., 2020). Due to their abundance, metal content, and the currently increasing demand for those metals (e.g., batteries for electric vehicles), nodules are of commercial interest, and 18 exploration licenses in the CCFZ have been issued and supervised by the International Seabed Authority (ISA, 2023). Apart from a potential metal resource, nodules are central to the deep-sea ecosystem as they are one of the few hard substrates on vast regions covered by otherwise soft siliceous clay. Several studies have linked nodule abundance (kg m−2), seafloor area coverage (% m−2), and nodule size (cm) to the abundance, spatial distribution, species richness, and community composition of benthic fauna (Amon et al., 2016; Vanreusel et al., 2016; Simon-Lledó et al., 2019a, b, 2020; Pape et al., 2021; Mbani et al., 2023; Uhlenkott et al., 2023). Because nodules are a valuable metal resource and deep-sea habitat, the study of their spatial distribution has been advanced.

In regional scales (hundreds to thousands km), ship-based multibeam echosounder systems (MBESs) and deep-towed side scan sonar (SSS) systems have been used to discriminate areas of nodule presence and absence using the backscatter response, typically at 12 kHz (Scanlon & Masson, 1992; Lee & Kim, 2004; Chunhui et al., 2015; Machida et al., 2019; Kuhn & Rühlemann, 2021; Wang et al., 2021). These data are linked with sparse (usually > 1.8 km apart) ground truth samples, e.g., box corers (sampling area of 0.25 m2) and geostatistical techniques (e.g., kriging) to provide nodule distribution and abundance (Kuhn & Rühlemann, 2021). Within contract areas, high-resolution seafloor mapping with autonomous underwater vehicles (AUVs) has revealed the fine-scale (meters to few km) spatial distribution of nodules (Schoening et al., 2017; Gazis et al., 2018; Peukert et al., 2018; Gazis & Greinert, 2021; Alevizos et al., 2022) and associated fauna (Simon-Lledó et al., 2019a,b; De Smet et al., 2021). The large amounts of hydroacoustic and optic data collected during such surveys have driven the application of machine learning (ML) at local (Gazis et al., 2018; Gazis & Greinert, 2021; Wong et al., 2021; Mbani et al., 2022), regional (Hari et al., 2018; Kaikkonen et al., 2019; Kuhn & Rühlemann, 2021; Wasilewska-Błaszczyk & Mucha, 2021) and global scales (Dutkiewicz et al., 2020). ML provides reliable predictions for geographical areas where good correlations between the response variable and covariates/predictors exist, and an adequate feature space representation for the training data is guaranteed (Gazis & Greinert, 2021).

Regional and local seafloor parameters influence directly the nodule spatial distribution, e.g., bathymetry (Craig, 1979; Kodagali, 1988; von Stackelberg, 2000), terrain ruggedness (Kodagali, 1988; Skornyakova & Murdmaa, 1992), outcropping bedrock (Dreisetl, 2016; Yoo et al., 2018; Alevizos et al., 2022), and volcanism and availability of nuclei (Glasby et al., 1973). Synergistically, the seafloor morphology alters the local hydrodynamics, e.g., when solitary knolls or seamount chains influence the bottom current magnitude and direction, causing erosive and depositional structures such as furrows and sediment drifts, respectively (Juan et al., 2018). The slope orientation (i.e., aspect) has been used as a proxy for the mean current direction in spatial models when continuous long-term information about bottom currents is unavailable (Gazis & Greinert, 2021; Uhlenkott et al., 2022). Alternations in local sediment depositional patterns that influence sediment thickness (e.g., seafloor pits with increased sedimentation and conical hills with sediment accumulation at the lee side) contribute to the patchy nodule distribution (Wiedicke & Weber, 1996; von Stackelberg, 2000; Juan et al., 2018; Gazis & Greinert, 2021). Such variations in sediment thickness are reflected in the MBES backscatter intensity, angular range analysis (ARA), and SSS backscatter mosaic (Yoo et al., 2018; Hillman et al., 2017; Gaida et al., 2019), while they can be inferred from topographic indices such as the bathymetric position index (BPI) (Gazis & Greinert, 2021; Alevizos et al., 2022) and can be measured in geophysical data obtained by sub-bottom profiler (SBP), e.g., thickness of semi-liquid bottom sediment layer (Cochonat et al., 1992; Yoo et al., 2018; Alevizos et al., 2022). The synergistic effect of seabed topography, bottom currents, and heterogeneous sedimentation patterns influences the geochemical parameters that affect nodule formation, e.g., removal of clay-size particles by currents, variations in total organic carbon content and shifts on Mn-redox depth in local depressions, highlighting the relationship between meter-scale seabed morphology and variation in geochemical parameters (Mewes et al., 2014; Volz et al., 2018; Paul et al., 2019).

Despite the ability to derive accurate predictions based on data that capture the underlying relationships, different ML algorithms result in divergent prediction performance even when using the same dataset (Stephens & Diesing, 2014; Diesing & Stephens, 2015; Robert et al., 2016; Li et al., 2017). Multi-model approaches could help decrease the prediction error, identify models that better generalize over broad geographical areas or a more complex feature space, and increase confidence in locations with good spatial agreement (Wenger & Olden, 2012; Diesing & Stephens, 2015; Robert et al., 2016). To our knowledge, the multi-model approach has not been used until now for fine-scale spatial prediction of nodules. Here, we use five well-established supervised ML algorithms: generalized linear models (GLMs), generalized additive models (GAMs), support vector machines (SVMs), random forests (RFM), and neural networks model (NNM). These algorithms represent five of the most common modeling architectures with different degrees of model complexity; they are available in all standard data analysis programming languages (e.g., R, Python) and some GIS software (e.g., ArcGIS, QGIS).

While prediction accuracy on an independent dataset is the standard way to assess model performance, it is equally important to understand how model predictions occurred. Studies have shown that accurate model predictions can be achieved even when irrelevant and artificial predictors are used, outperforming models with domain-relevant predictors (Fourcade et al., 2018; Behrens & Viscarra Rossel, 2020). A tradeoff between interpretability and accuracy is thus needed, particularly in natural sciences, where the models are used for causal explanation (Breiman, 2001b; Shmueli, 2010). However, interpreting ML results is difficult, especially when our knowledge or conceptual understanding is still developing and complex methods (e.g., neural networks) are used exclusively (Merow et al., 2014). In a multi-model approach, the contribution of each predictor can be evaluated across different models, providing better insights into their importance (Li et al., 2017). Besides variable importance measures are partial dependence plots (PDPs), which depict the relationship type (linear, monotonic, or complex) built between a response and the predictor variables during model training (Friedman, 2001). PDPs are one of the most widely used model-agnostic methods in regression problems with large datasets due to their simple interpretation and computation efficiency (Molnar, 2018).

In addition to the multi-model approach followed, an ARA was done to better interpret the contribution of MBES backscatter as a spatial predictor. The backscatter angular response for a specific frequency and incident angle is an inherent property of the seafloor (Jackson & Briggs, 1992; Fonseca & Mayer, 2007). The ARA has a higher angular resolution than the processed backscatter mosaic (where several angles from different swaths have been blended and AVG filtering has been applied), but it has a decreased along-track resolution because it is a stack of 30 consecutive MBES pings for the half swath width on port and starboard side (Fonseca & Mayer, 2007; Fonseca et al., 2009). Despite the lower along-track resolution, ping stacking at different beam angles could help to identify the beam angles that contribute the most to the spatial predictions and reveal subtle differences between seafloor patches with different sediment properties. Beyond the extraction of average angular response at certain incident angles, ARA attempts to characterize sediment properties by comparing the average angular response for each group of incident angles with mathematical models that have linked backscatter and sediment properties (Jackson et al., 1986; Fonseca & Mayer, 2007). Based on this model inversion, the backscatter impedance (hereafter impedance) and backscatter volume (hereafter volume) were calculated. Impedance is the product of sediment bulk density and sound velocity ratio between sediment compressional wave speed and water sound speed, showing the acoustic contrast between water and sediment surface (Fonseca & Mayer, 2007). Sediments with lower water content and typically larger grain size provide higher impedance and stronger backscatter signal (Jackson & Briggs, 1992). Volume scattering provides information on the sound attenuation within the sediment concerning sediment physical properties (e.g., density) and inhomogeneities within the sediment column (e.g., buried nodules or fragments of them, burrows, and gas bubbles). In fine-grain size sediments (e.g., deep-sea clay and silt) with low density and roughness, volume scattering could be the dominant contributor at lower frequencies, intermediate oblique incident angles (> 15° and < 60°) and when a bioturbated or harder/rougher layered substrate exists below the surficial thin, soft layer, e.g., thickness variations in the semi-liquid bottom sediment layer (Jackson et al., 1986; Lurton & Lamarche, 2015).

To enhance the geomorphological analysis and better understand the spatial distribution of nodule coverage, we constructed high-resolution orthophoto-mosaics of the seafloor at a spatial resolution of 2 mm pixel, which has not been acquired elsewhere in such detail and water depth until now. In addition, image-derived digital elevation models (DEMs) from seafloor parts with and without nodule coverage supplemented the analysis and interpretation of MBES and SSS backscatter.

Study Area

The study area was in the Eastern CCFZ, within the B4 domain of the global sea mineral resources (GSR) contract area (Fig. 1). This stretch of the deep-sea, with an area of 37.34 km2 and − 4520 to − 4392 m water depth, is one of the study sites of the in situ mining trials of the Patania II pre-prototype seafloor nodule collector vehicle (Vink et al., 2022). It is part of a larger geomorphological system of abyssal hills and valleys (horsts and grabens, respectively) aligned to the N–S/NW–SE axis, parallel to each other (Supplementary Fig. S1), a typical structure in the eastern CCFZ (Macdonald et al., 1996; Parianos et al., 2022).

Figure 1
figure 1

(a) Map showing the CCF between North America and Hawaii. The darker blue regions are the exclusive economic zones (FMI, 2019). (b) Map showing the nodule exploration contract areas (black polygons), reserved areas (light blue polygons around the exploration contract areas) and the areas of particular environmental interest (hashed blue squares) in the CCFZ (ISA, 2023). The GSR exploration contract area is divided into three blocks (marked in yellow). The black dot indicates the location of the study area. (c) AUV-based bathymetry of the study area (grid cell is 2 m × 2 m; contour interval 1 m). The background bathymetry is from ship-based MBES data, with grid cell size of 50 m × 50 m and contour interval of 25 m (Gazis, 2020). The black lines are the AUV seafloor images used for spatial modeling. Seafloor images exist along the entire AUV transect, but only the images obtained at an altitude ≤ 7 m were used for spatial modeling and shown here. The locations of inset images in Figures 2 and 3 are indicated

The central part is a wide N–S oriented U-shape valley limited to the south-west by a volcanic knoll with a crater on top. To the south-east, it is limited by a horst, and to the south, there is a sill with slopes of up to 5° and two cone-shaped morphological features (Fig. 1, Supplementary Fig. S1). Several sub-circular and irregularly shaped depressions exist inside the central basin, creating an uneven seafloor. The largest sub-circular depression has a diameter of ~ 400 m and a water depth of − 4520 m, the deepest part of the study area (Fig. 1, Supplementary Fig. S2). Based on the box corer samples retrieved from the central part (Haeckel & Linke, 2021), the seafloor is covered by unconsolidated, very soft, yellowish to medium grayish-brown clayey silt with traces of fine sand and biogenic siliceous components, being homogeneous at the first 5 cm (Supplementary Fig. S2). The sediment had an increased water content (55.63–72.28%) and bulk density of 1.35 t m−3 (GSR, 2018).

The nodules were mainly of diagenetic origin with Mn/Fe ratio > 5 (Halbach et al., 1981; GSR, 2018). Their shape was ellipsoidal, with the major and minor axes having a linear relationship (Schoening & Gazis, 2019; Yu & Parianos, 2021). Their texture was smooth or botryoidal on top, separated by a visible equatorial belt from the rough buried bottom part (Yu & Parianos, 2021). The nodule density was 1.99 g cm−3 (GSR, 2018). As regards the central part of the study area, the median abundance, including surficial and buried nodules, was 22.4 kg m−2 of wet weight (Schoening & Gazis, 2019). The percentage of buried nodules (first 10 cm in the sediment column) accounted for 14.3% of the total number of nodules within the box corer samples (Schoening & Gazis, 2019). The % seafloor nodule coverage (nodule coverage) based on the few box corer samples ranged from 4 to 43%, with a median of 38% (Schoening & Gazis, 2019). This nodule coverage range agrees with the nodule coverage derived from the seafloor images within the central part (Fig. 2). The nodules were large individuals, not interconnected to other nodules (Fig. 2). Lower nodule coverage and smaller nodules existed around local depressions. Within the local depressions, the nodule coverage was < 3% (Fig. 2). Mounds of sediment accumulation without nodules on top caused by bioturbation (i.e., tumuli) co-existed between the nodules throughout the study area (Supplementary Fig. S3). Traces of mobile surface fauna and bioturbated sediments were found in seafloor images, box corer, and gravity corer samples (e.g., sinuous burrows).

Figure 2
figure 2

Color-normalized AUV-generated seafloor images from different locations within the study area. Each image shows a different nodule coverage. The location of each image is given in Fig. 1

The volcanic knoll's northern and eastern/southeastern parts have gentle slopes (3°) and relatively smooth terrain. The northwestern part had steeper slopes (5–7°) and downslope channels, indicating sediment movement downslope (Fig. 1, Supplementary Fig. S2). Lower nodule coverage and smaller nodules were observed here (Supplementary Fig. S4). The main crater was 50–60 m deep (Supplementary Fig. S2). Based on seafloor images, there were nodule-free areas, consolidated sediments, and rocky outcrops around and inside the crater (Fig. 3).

Figure 3
figure 3

Color-normalized ortho-photomosaics from different seafloor locations within the study area (the location of each section is given in Fig. 1). The grid cell size is 2 mm × 2 mm. (a) Exposed carbonate cemented sediment layers appear as successive steps along the eastern slope of the trough. Densely packed small nodules are observed at the deepest and flattest part, while the carbonate steps exhibit fewer nodules. (b) Four solitary pillow basalt blocks in the southern part of the trough with associated fauna (Actiniaria). (c) and (d) Basement exposure, basaltic debris, small nodules, and nodule fragments generate a hard substrate inside the trough. The sharp-edged steps with detached, undissolved rocky fragments indicate a cobalt-rich ferromanganese crust. Sessile (Porifera, top left) and motile fauna (Arthropoda, bottom right) are shown within the black inset in (d). The fauna annotation was based on the standardized taxonomic field guide presented by Simon-Lledó et al. (2023). (e) The outer crater shows low-to-no nodule coverage and angular rocks as seafloor pavement

The horst at the south-east part of the study area had a length of 2.3 km and maximum width of ~ 1 km. Its western side was nodule-covered and had a maximum slope of 10°, which forms a topographic high at 4392 m water depth, the shallowest part of the study area (Fig. 1, Supplementary Fig. S2). The opposite east side of the horst structure was bounded by inward-dipping normal faults with a sediment-free vertical cliff (60–80°). The base of the cliff was an enclosed U-shape collapsed trough (>80 m altitude difference), related to a combination of faulting and carbonate dissolution (Mayer, 1981; Fig. 1). Its width varied from ~ 100 m in the northern part to ~ 700 m in the southern part (Fig. 1). Different nodule facies existed here: In the northern part, the seafloor had smaller nodules with high spatial density. This part was characterized by exposed compacted and layered indurated sediments at successive steps. On top of the exposed parts, the loose sediment was winnowed away, and small nodules lie (Fig. 3). Such structures within deep-sea troughs have been mapped in the past in CCFZ (Mayer, 1981). In the southern part of the trough, the seafloor was covered by densely packed small nodules, debris, pillow basalts, basaltic fragments, and rocky outcrops (Fig. 3).

The highest nodule coverage (up to 50%) was observed at the southeastern edge of the study area (Fig. 2). Geomorphologically, prominent sites have been linked to high nodule coverage because of abundant nuclei material that existed as fragments or was gravitationally transferred downslope when steep slopes exist (Yamazaki & Sharma, 2000; Sharma, 2017; Li et al., 2021). The nodules here were smaller than in the central and flatter part of the study area (Fig. 2), a fact that could be attributed to one or several factors: the limited available horizontal space relative to a high number of nuclei, younger age, primary hydrogenetic growth, and limited supply of dissolved metal through diagenetic precipitation due to thinner or absent semi-liquid bottom sediment layer (i.e., geochemically active layer) due to frequent basement exposure (Cochonat et al., 1992; Lipton et al., 2016). Further to the southeast, three pits (or potholes) of varying depths (10–50 m) were presented (Fig. 1, Supplementary Fig. S1). They had sharp walls (locally steeper than 60°) with layered consolidated sediments (Supplementary Fig. S5). Pits existed along all the main horsts within the broader study area, having varying depths and sizes (Supplementary Fig. S1). Their presence has been linked to the dissolution of outcropping carbonate and Miocene limestone because the study area is at the limit of the calcite compensation depth (CCD; Berger et al., 1976) augmented by turbulent erosion and hydrothermal discharge through conduits created by faulting and differential compaction between the asymmetrical flanks (Mayer, 1981; Bekins et al., 2007; Moore et al., 2007).

Methodology

MBES Data Acquisition and Post-Processing

The used HUGIN 6000 AUV was equipped with a Kongsberg EM2040 MKII operating at 400 kHz, with beam opening angles of 0.7° × 0.7°. The swath opening angle was 120°, and the flying altitude was kept constant at 60 m above the seafloor, resulting in uniform swath width and ~ 50% overlap between neighboring lines. A detailed description of the acquisition system settings (e.g., beam spacing pattern, pulse mode, and ping rate) is provided in the expedition report (Vink et al., 2022). The raw data were stored in the Kongsberg .all format. The navigation data were post-processed onboard using Kongsberg's proprietary NavLab software. The post-processing was done with the QPS Qimera v1.7 software and included sound velocity profile ray tracing correction, roll bias correction, and tide correction based on pressure changes of two fixed deployed SBE16 CTDs (Vink et al., 2022), AUV turn removal, automated CUBE filtering (Calder & Mayer, 2003), and manual removal of erroneous soundings. The generated bathymetric grid was exported as GeoTIFF with 2 m × 2 m cell size in Universal Transverse Mercator coordinates zone 10N (hereafter UTM 10N). The absence of navigational drift was confirmed using the available ship-based MBES data (Gazis, 2020) as a reference layer.

The processed lines were exported to MBES generic sensor format (.gsf) and imported to the QPS Fledermaus Geocoder Toolbox (hereafter FMGT) v7.8 for the backscatter post-processing. The FMGT built-in workflow was followed (i.e., radiometric correction, slope correction based on the exported bathymetric grid, angle varying gain (AVG) correction, nadir filtering, anti-aliasing, despeckling, and mosaicking) to create the backscatter mosaic (Fonseca & Calder, 2005). The backscatter mosaic was exported as a GeoTIFF grid with 2 m × 2 m cell size in UTM 10N. In addition to a backscatter mosaic, the ARA in the near range (incident angles within 0–25°), far range (incident angles within 25–55°), and outer range (incident angles within 55–85°) were exported as GeoTIFF grids in UTM10N. The ARA beam details (i.e., ping number, beam number, easting, northing, depth, backscatter value, corrected backscatter value, and true angle) were exported in .txt format for specific MBES lines for comparing patches with variations in backscatter intensity and nodule coverage. Based on the ARA model inversion, the acoustic impedance (hereafter impedance) and volume scattering (hereafter volume) were calculated and exported as GeoTIFF grids in UTM10N.

SSS Data Acquisition and Post-Processing

An AUV-mounted EdgeTech 2205 SSS was used to acquire seafloor backscatter data at 230 kHz, parallel to MBES mapping (Vink et al., 2022). No interference was observed between the two systems. The raw data were stored in EdgeTech .jsf format (in which the post-processed navigation was integrated) and processed with the SonarWiz v7.09 software. The post-processing included bottom tracking based on the provided threshold detection method, slat range and bottom slope correction, gain correction using the empirical gain normalization method, nadir correction, and turn removal. The high overlap (~60%) between the neighboring lines resulted in the absence of nadir gaps in the final mosaic. The final mosaicking was done using the average blending method. The SSS backscatter mosaic extent was clipped to the MBES grid extent and exported as a GeoTIFF grid in 2 m × 2 m cell size in UTM10N.

MBES and SSS Derivatives

Following results from previous studies (e.g., Gazis & Greinert, 2021), 19 derivatives were produced (Table 1). Before calculating the derivatives, the mining site area and adjacent periphery (~ 450 × 430 m) were clipped out (in QGIS v3.24) as the anthropogenic nodule removal and seafloor disturbance alternated the natural bathymetric and backscatter values. The three source grids (MBES bathymetry, MBES backscatter, SSS backscatter) were filtered using a Gaussian filter (in SAGA GIS v8) with standard deviation (SD) of 1 and a kernel radius of 2 pixels to eliminate any remaining single-cell outliers that could affect negatively the derivative calculation and modeling performance (Lucieer et al., 2016; Lecours et al., 2017). Afterward, the derivatives were produced. The aspect was transformed into eastness and northness (Florinsky, 2017). None of the derivatives had zero variance (Supplementary Table S1) or was perfectly correlated (r > 0.85, p < 0.05) with another derivative (Supplementary Fig. S6). The maximum correlation was among the ARA outputs. Because the ARA contribution to modeling performance was one of the study's interests, all ARA outputs were kept. The caret (Kuhn, 2022) and GGally (Schloerke et al., 2023) R packages were used for the zero variance and correlation analyses.

Table 1 MBES and SSS derivatives used for the spatial modeling

Image Data Acquisition and Post-Processing

High-resolution seafloor images (4096 × 3072 pixels) were acquired by the AUV HD CathX Colour Still Camera (M12 A1000) and stored in JPEG format (Vink et al., 2022). Like the hydroacoustic data, the AUV navigation from the photo-surveys was post-processed using Kongsberg's proprietary NavLab software to define each image location, camera altitude and the photographed seafloor image area. The nodule coverage was calculated using the saltation GmbH & Co. KG proprietary H2SOM software (Supplementary Fig. S7), which automatically detects and segments polymetallic nodules in underwater images using a hierarchically growing hyperbolic self-organizing map approach (Schoening et al., 2016). Other researchers also have used the H2SOM software to quantify the nodule coverage in seafloor images (Ellefmo & Kuhn, 2021; Kuhn & Rühlemann, 2021). The fine-tuning of the software parameters for this dataset was done by saltation GmbH & Co. KG. During the image acquisition, the AUV flying altitude was set at 5 m above the seafloor, which could not always be kept constant due to altimeter bottom detection warnings, which forced the AUV to fly higher in some parts of the survey lines (up to 15 m in some cases). Image evaluations showed that the photos obtained above 7 m could not offer meaningful information and thus were excluded. Images along the AUV turns and those outside the MBES grid extent were also excluded. In total, 31,409 images were used to extract the nodule coverage for model training and testing.

Although not a direct topic of this study, the conversion of nodule coverage from seafloor images (%) to nodule abundance (kg m-2) was not possible due to the absence of needed information to establish local equations between the seafloor images and the box corer data. Apart from the available image-derived nodule coverage, the image-derived size distribution and box corer-derived nodule abundance, coverage, and size distribution were needed at the seafloor image locations. In addition, the seafloor images tended to underestimate the actual nodule coverage due to semi-buried nodules or nodules covered by faint sediment (Sharma, 1993; Ellefmo & Kuhn, 2021; Tsune, 2021). The degree of underestimation varied among different nodule sizes, facies and substrates, being higher in sub-areas with thicker sediment cover and lower in sub-areas with rocky substrate and basement exposure (Sharma, 1993, 2017; Parianos et al., 2021). Finally, seafloor images cannot provide information on the buried nodules (Sharma et al., 2013; Lipton et al., 2016; Sharma, 2017), and correction factors based on box corer data must be applied. Nevertheless, studies have related higher nodule coverage with higher nodule abundance in areas without surficial sediment on nodules—like our study site (Lipton et al., 2016; GSR, 2018; Wasilewska-Błaszczyk & Mucha, 2021). Thus, image-derived nodule coverage and model predictions (see Results section) could guide future box corer sampling at seafloor image locations and sub-areas with different nodule coverage.

Orthophoto-Mosaics and Image-Derived Digital Elevation Models

The seafloor images were taken at 5 Hz, resulting in a high overlap (~75%) in the along-track direction, which allowed the computation of high-resolution orthophoto-mosaics and image-derived DEM. Before orthophoto-mosaicking, the original seafloor color was restored in underwater images using the in-house software Image Normalization (Köser et al., 2021), which removes lighting artifacts and illumination cones (Supplementary Fig. S7). Orthophoto-mosaics and the respective DEMs were calculated using the Agisoft Metashape Pro v8.3 software, following the built-in workflow (Camera calibration, Image alignment, Dense Point Cloud creation, Dense Point Cloud Confidence filtering, DEM calculation, and orthophoto mosaic reconstruction). The orthophoto-mosaics and DEMs were exported as GeoTIFF grids with 2 mm × 2 mm and 5 mm × 5 mm pixel/cell sizes, respectively.

Spatial Modeling

Data Pre-Processing

The predictor values of the MBES and SSS data sets were extracted from each image location using a bilinear interpolation of the four adjacent cells to reduce the effect of AUV positioning uncertainty. The data were split into training (80%) and testing (20%) datasets using stratified sampling without replacement (i.e., no duplicate samples between training and testing datasets), ensuring good representativeness of the univariate density distribution and spatial coverage between the two datasets. The splitTools (Mayer, 2022) R package was used for this purpose. The testing dataset was kept from further steps such as correlation analysis, data transformation, cross-validation (CV), and model training to avoid data leakage.

Data Transformation

As the nodule coverage (response variable) was given in % relative to the observed seafloor, the arcsine transformation was used to avoid extrapolation below 0% or above 100%. In addition, it enhanced the model comparison and interpretability at the extreme ends because it stretched out the data in these two areas. The transformation was done in R using the formula: \(y=\text{arcsin}\left(\sqrt{x}\right)\), where x is a real number from 0 to 1, representing the nodule coverage in percentage. The model predictions were back-transformed using the formula: \(x={\text{sin}(\text{y}))}^{2}\), and multiplied by 100 to return the predicted percentages of nodule coverage. The performance metrics on the testing dataset were calculated using the back-transformed data.

ML Algorithms

The following regression algorithms were applied using the caret package in R (R Core Team, 2022):

GLM is an extension of the linear regression model that allows for incorporating non-normally distributed data (e.g., Poisson, binomial) using link functions that relate the predicted response variable to predictors (McCullagh & Nelder, 1989). Here, we used a GLM augmented with component-wise gradient boosting (Bühlmann & Hothorn, 2007) as implemented in the mboost (Hothorn et al., 2021) R package.

GAM is a data-driven approach that divides data into sections (knots), fitting independent smooth functions (e.g., splines) for each section, assuming that the built functions are additive. It can handle non-linear and non-monotonic relationships between the response and predictors without assumptions regarding the distribution of the response variable and predictors (Hastie & Tibshirani, 1986). Like the GLM approach, a component-wise gradient-boosted GAM that uses penalized B-splines (a.k.a. P-splines; Eilers & Marx, 1996) as a base learner was applied using the mboost R package.

RFM is an ensemble of regression trees that grow independently using a random subset of the training data and a random subset of predictor variables (Breiman, 2001a). Thus, correlations among trees are minimized. In each tree, the 'parent' nodes are split using the best subset of predictors that minimizes the variance of the 'child' nodes until the maximum defined depth is reached. The final prediction is the average prediction of all trees (in regression problems). The ranger R (Wright & Ziegler, 2017) package was used.

SVM is a non-parametric technique that uses kernel functions to divide the hyperparameter data space into subspaces (hyperplane), aiming for good separation of the data (Cortes & Vapnik, 1995). The kernel functions can be linear, polynomial, sigmoidal or radial basis and can project data from a low-dimensional space to a higher dimensional space, the so-called 'kernel trick' (Karatzoglou et al., 2004). The prior choice of kernel function is arbitrary, and several kernel functions should be tested as the performance may vary depending on the data size and relationships among data. The kernlab (Karatzoglou et al., 2023) R package was used.

NNM is a data-driven approach to extract patterns (i.e., relationships) from data using algorithms (e.g., forward propagation, backpropagation) and activation functions (e.g., logistic, tanh). It creates weighted associations between predictors and response variables at the hidden layers, which receive information from all nodes of the previous layers and transfer it to the activation function and next layer(s). Non-linear activation functions allow the network to deal with non-linearities between predictors and response variable (Lippmann, 1987). We used the neuralnet (Günther & Fritsch, 2010) R package to develop an NNM that uses resilient backpropagation with weight backtracking (a.k.a., rprop + ; Riedmiller & Braun, 2003; Riedmiller, 1994).

Hyperparameter optimization (aka tuning) was done for all models. GLM tuning included the number of boosting iterations (Supplementary Table S2). GAM tuning included the number of boosting iterations and the degrees of freedom in P-splines (Supplementary Table S3). RFM tuning included the number of predictor variables randomly sampled at each node split, the split rule and the minimum node size (Supplementary Tables S4S5). SVM tuning optimized the kernel type, polynomial degree, scale and cost hyperparameters (Supplementary Tables S6S7). The number of hidden layers, neurons and activation functions were optimized in NNM (Supplementary Tables S8S9). An exhaustive hyperparameter tuning was out of the scope of this paper. The optimization process occurred during the spatial CV, which is the preferred method when data are spatially autocorrelated (Schratz et al., 2019; Gazis & Greinert, 2021).

The variable importance was extracted for all models. Each model estimated variable importance differently: In GLM, the absolute value of the coefficients from the final model was used (Kuhn, 2022). In GAM, the in-bag risk reductions per boosting iteration (error difference between the current and the previous step for each base learner contained in the model) were calculated (Hothorn et al., 2021). In RFM, the average permutation variable importance from all trees was used (Breiman, 2001a). In SVM, there is no model-specific way to calculate the importance. Instead, we used a permutation approach, where the prediction error was measured before and after shuffling the values of the predictor variable. The larger the prediction error after shuffling, the higher the importance of this predictor variable; the iml (Molnar, 2018) R package was used. In NNM, we used the model weights assigned to each predictor to measure variable importance; the weights were calculated based on the Garson algorithm (Garson, 1991; see also Goh, 1995), as it is implemented in Neural Net Tools (Beck, 2018) R package. All variable importance outputs were scaled between 0 and 100 for better comparison across the different models (Kuhn, 2022). PDPs were calculated using the iml and pdp (Greenwell, 2017) R packages. All needed graphs were produced using the ggplot2 (Wickham, 2016) R package.

CV Strategy and Extrapolation Assessment

ML models can yield over-optimistic results when a non-spatial, random k-fold CV is used for spatially autocorrelated data (Roberts et al., 2017; Meyer et al., 2018; Ploton et al., 2020; Karasiak et al., 2022). Here, a spatial k-folds CV strategy was applied. The folds' dimensions (and consequently, number) are a tradeoff among spatial autocorrelation range, data representativeness, and model extrapolation (Roberts et al., 2017; Valavi et al., 2019). Small folds could not eliminate the influence of spatial autocorrelation in the training locations (over-optimistic predictions), while too large folds can result in the removal of a big portion of training data, which creates feature space gaps and causes unnecessary extrapolation that reduces modeling performance (over-pessimistic results). We used an incremental Moran's Index analysis to identify the maximum distance of spatial autocorrelation. This analysis was done in ArcMap v10.6 according the provided equations, the null hypothesis (i.e., the examined attribute is randomly distributed) and the significance level (p < 0.05). The conceptualization of spatial relationship was based on the inverse Euclidian distance, and the spatial weights were standardized, eliminating any bias that could be induced due to the different number of spatial neighbors (Mitchell & Griffin, 2021). Three different numbers of non-overlapping spatial folds were created and compared: 1 × 1 km (28 blocks), 3 × 3 km (5 blocks) and 5 × 5 km (2 blocks; Supplementary Fig. S8).

For each of the three spatial block configurations, the density distributions of geographical and feature space distances were compared among the training dataset, the CV folds and the total geographical and future space area used for model predictions aiming to identify the best size of the spatial blocks (tradeoff between the minimization of spatial autocorrelation and need for extrapolation). This comparison was made using the CAST (Meyer et al., 2024) R package, which implements the abovementioned method proposed by Meyer and Pebesma (2022). The degree of how well a model extrapolates in space was assessed using the area of applicability (AOA) method (Meyer & Pebesma, 2021). This method outputs the geographical areas where the multivariate feature space conditions are similar to those captured by the model, and the CV predictive performance still holds. The geographical parts with novel feature space conditions were returned as non-applicable areas (Meyer & Pebesma, 2021). The feature space dissimilarity analysis is based on the normalized and weighted (according to model weights, a.k.a. variable importance) Euclidean distances between training data and the total feature space of the entire study area. The AOA was calculated using the CAST R package.

The schematic flowchart that illustrates the main methodological and modeling steps is presented in Supplementary Figure S9.

Results

CV Block Size

The incremental Moran Index analysis showed that nodule coverage was spatially autocorrelated, particularly in distances < 1 km (Moran I statistic > 0.5, p < 0.05). At 3 km distance, the Moran I statistic was < 0.1 (p < 0.05) with decreasing trend. It reached zero at distances over 4.5 km (Supplementary Fig. S8). The analysis showed that, at 1 km, the geographical distances were still smaller than those needed for representative predictions, and the spatial autocorrelation was still considerable, potentially leading to over-optimistic results (Fig. 4). At 5 km, the geographical and feature space distances inside CV blocks were larger than those encountered during prediction, resulting in unnecessary extrapolation (Fig. 4). The 3 × 3 km CV approach showed the best representativeness between the training data and the model space, and it was selected for model training, tuning and evaluation (Fig. 4). All block configurations outperformed the random-CV strategy (Fig. 4).

Figure 4
figure 4

Geographical and normalized multivariate predictor space Euclidean distances among train dataset, CV folds, and predicted area for (a) random-CV, (b) spatial-CV (1 × 1 km), (c) spatial-CV (3 × 3 km) and (d) spatial-CV (5 × 5 km)

Predictive Performance, Nodule Coverage and AOA

RFM had the best predictive performance on the testing data, followed by NNM (Table 2). SVM and GAM had the same coefficient of determination (hereafter R2) between model predictions and observations on the testing data. However, SVM had a lower mean absolute error (hereafter MAE) and root mean squared error (hereafter RMSE) than GAM. GLM had the worst fit on the testing data (R2 = 0.72; Table 2).

Table 2 Model performance metrics on test dataset for the back-transformed predictions

All models reproduced the testing data distribution (e.g., mean, median, interquartile range) well but not the minimum (min) and maximum (max) values of the response variable (Table 3). Despite the best overall performance, RFM yielded the lowest value range, overestimating the lower values and underestimating the higher values (Table 3). NNM, SVM and GAM predictions were closer to the initial min–max range. GLM was the only model that predicted beyond the min value of the testing data, but it also underestimated the upper range (Table 3).

Table 3 Summary statistics of test dataset and model predictions on the test dataset

All models yielded similar nodule coverage maps and interquartile ranges (Fig. 5, Supplementary Table S10). Differences in the predicted min–max range were observed as GAM extrapolated only in the lower range of nodule coverage, NNM only in the upper range, and GLM and SVM extrapolated in both directions (Fig. 5). SVM predicted 100% nodule coverage in isolated pixels, which were not observed in the seafloor images or predicted by any other model (Fig. 5). The model predictions beyond the min–max range of training data had limited spatial extent (≤ 0.23% of the total area; Supplementary Fig. S10).

Figure 5
figure 5

(a) Maps of the spatial predictions of nodule coverage and range of predicted values (over the entire study area for each ML model. Along a north–south axis at the west part of the study area, the seafloor is quasi-homogeneously covered with nodules (20–30% coverage). Higher coverage exists in the southeast but is rather localized (> 35%). The central and deeper part has lower nodule coverage (~15–25%) that progressively decreased, reaching almost zero values at the deepest part of the basin. Lower coverage also existed at the northwest corner (crater slope). (b) Boxplots of nodule coverage predicted values for the entire study area

The weighted-average prediction surface from all models showed excellent spatial agreement with the seafloor images from the testing data and box corer samples (Fig. 6). Comparing the models' predictions pixel-by-pixel, we obtained the spatial differences (here expressed as SD of the predicted values). The largest differences were in geographically small sub-areas associated with sharp morphological transitions, bedrock exposure, consolidated sediments, and volcanic knolls. The highest SD was within the trough at the southeast part of the study area (Fig. 6). The combination of deeper parts with steep slopes (linked with lower coverage) and high backscatter reflectance (linked with higher nodule coverage), which is partly attributed here to nodule coverage and partly to hard substrate, led to contradictory predictions: RFM predicted low nodule coverage, while SVM yielded high nodule coverage (Fig. 6).

Figure 6
figure 6

(a) Nodule coverage map of the weighted-mean prediction from all models. (b) Map inset from the central part of the study area, which shows the spatial fit between the predicted surface, test data (derived from seafloor images) and box corer data. (c) Variance of the models expressed as SD of the predicted values from each model at each cell). (d) Map showing the areas where models extrapolate based on the AOA method

Regarding model transferability, the spatial-CV approach showed that all models performed well when a large part of the geographical and feature space was kept out, a fact partially attributed to the representativeness of the univariate density distribution of each predictor by the training sample (Supplementary Fig. S11). The CV residuals from all models had weak spatial autocorrelation and were normally distributed (Supplementary Fig. S12). RFM had the best average performance in CV folds, followed by GAM, which had the smallest interquartile of all models (Fig. 7). However, the model performance varied among the spatial blocks, with median being higher than the mean in all models (Fig. 7). This variation was due to the poor performance of all models in the same spatial block (Fig. 7). This spatial block was in the southeast of the study area (Fig. 7), having a complex geomorphology which differed from the rest of the study area. The presence of horst, cliff, trough, exposed indurated sediments and different nodule facies (Figs. 1, 2, 3) resulted in extreme derivative values that forced the models to extrapolate beyond the training feature space, as shown by the feature space dissimilarity analysis and AOA (Fig. 6).

Figure 7
figure 7

(a) CV performance for each ML model. The median (thick blue line) is higher than the mean (black dot) in GLM, GAM, SVM and RFM. The low performance in one block pulls down the mean R2. In NNM, the mean and median are the same. Detailed statistics are provided in Tables S11S13. (b) In-fold performance for each spatial block and model. All models performed similarly in each CV spatial block. (c) Spatial orientation of the CV spatial blocks. Block 5 has the smallest AOA (see Fig. 6)

The sub-areas where severe model extrapolation occurred were limited because each model was applicable in more than 91% of the total area (Supplementary Fig. S13). All models had similar AOA, with marginal differences to occur locally (Supplementary Fig. S13). All models applied to 88% of the study area (pixel-by-pixel comparison regarding the pixels where all models applied simultaneously); there was 6% for which only some models were applicable (at least one) and another 6% for which no model was applicable (Fig. 6).

Variable Importance and ML Interpretation

The variable importance calculation showed that BS, ARA_FM and SSS were the three most important predictors for the spatial distribution of nodule coverage (Fig. 8). BS and ARA_FM ranked in the two first positions in all models. The swapping between first and second played among the models was probably due to the correlation (r = 0.8, p < 0.05; Supplementary Fig. S6). SSS was the third most important predictor for two models (GAM, SVM) and scored high in GLM and RFM (Fig. 8). In NNM, the replacement of SSS from SSS_LM, which ranked as the third most important predictor, was observed. SSS and SSS_LM were also correlated predictors (r = 0.7, p < 0.05; Supplementary Fig. S6).

Figure 8
figure 8

(a) Variable importance for each model trained with all available predictor variables. BS and ARA_MF scored first in all models. (b) Variable importance for each model trained without the hydroacoustic backscatter-related predictors

The MBES backscatter mosaic and ARA analyses depicted the differences in nodule coverage (Fig. 9). A distinct difference (> 5 dB) in backscatter intensity from seafloor parts without and with nodules existed (Fig. 9). Moreover, nodule coverage variations in the order of 10% were detectable, highlighting the sensitivity of the used 400 kHz frequency (Fig. 9). All models captured the monotonic relationship between MBES backscatter and nodule coverage similarly but with different degrees of complexity (Fig. 10). Although the backscatter differences were distinct for the entire range of incident angles, they were more prominent in the mean-far angle range from 25 to 55°, i.e., ARA_MF (Fig. 9). The inner and outer incident angles (ARA_MN and ARA_MO, respectively) ranked lower than the ARA_MF. ARA derivatives (ARA_IM and ARA_VO) also contributed to the model performance. The SSS backscatter mosaic also discriminated well between seafloor parts with different nodule coverage (variations in the order of 10%) and increased the model performance (Figs. 9, 10, Supplementary Table S15).

Figure 9
figure 9

(a) Post-processed MBES backscatter mosaic of the study area and image-derived nodule coverage. (b) ARA curves from seafloor patches with different nodule coverage. MBES soundings from locations with different nodule coverage show a clear difference in backscatter intensity for the entire range of incident angles. (c) Boxplots of SSS backscatter intensity from seafloor patches with different nodule coverage. The black horizontal line corresponds to the median, and the black dot corresponds to the mean value. (d) Bar plots of the prediction performance of each model using only the geomorphological predictors derived from MBES data (light blue), geomorphological predictors derived from MBES data and SSS backscatter intensity (medium blue), all available predictors including MBES backscatter data and ARA (dark blue). GLM had the biggest predictive improvement when the SSS and MBES backscatter data were added (Supplementary Table S6) due to the monotonic/quasi-linear relationship between SSS and MBES backscatter with nodule coverage (Fig. 10)

Figure 10
figure 10

(a) PDPs showing the relationship between nodule coverage (transformed values) and MBES BS for GLM, GAM and RFM. As the model complexity increases (left to right), the model captures more subtle relationships, resulting in less smooth regression lines. The black marks in the x-axis show the nodule coverage distribution to MBES BS values. (b) PDPs showing the regression plane between nodule coverage, MBES BS and SSS

All models, particularly GLM, GAM and SVM, heavily relied on the hydroacoustic backscatter-related predictors (BS, ARA, SSS and their derivatives) for error minimization and predictions. The broad-scale seafloor geomorphology, such as bathymetry (BT), slope (S), broad-scale BPI (BPI100_300) and the eastward slope orientation (E) also related to the spatial distribution of nodule coverage, which became more evident when the hydroacoustic backscatter-related predictors are not used (Fig. 8).

The PDP of BT showed a bathymetric range between − 4425 and − 4500 m, in which the nodule coverage was higher in this study area. Depths beyond that range were related to seafloor depressions or topographic highs where no nodule existed (Fig. 11). Similarly, the PDP of BPI100_300 showed that the nodule coverage was higher where BPI values were around 0, i.e., flat or gently sloping seafloor (Fig. 11). E also had a monotonic relationship with nodule coverage. Higher E values were associated with higher nodule coverage (Fig. 11).

Figure 11
figure 11

RFM PDPs of nodule coverage concerning (a) bathymetry (BT), (b) broad-scale bathymetric position index (BPI100_300) and (c) Easting (E)

At a rather detailed 2-m pixel size, PrC, PlC, and VRM seemed to make only a small contribution to the spatial distribution of nodule coverage (Fig. 8). However, the image-derived DEM from seafloor parts with and without nodule coverage showed distinct differences in seafloor roughness along seafloor parts, although the sediment characteristics stayed similar (Fig. 12). These roughness differences were well captured and integrated into the MBES and SSS backscatter at 400 and 260 kHz, respectively.

Figure 12
figure 12

(a1) Orthophoto-mosaic (2 mm pixel size) from 10 overlapping images of a seafloor part with nodule coverage, (a2) hill-shade of the image-derived DEM (5 mm pixel size) and (a3) image-derived seafloor roughness (5 mm pixel size) from the same seafloor part. (b) Orthophoto-mosaic (2 mm pixel size) of 10 overlapping images from a seafloor part without nodule coverage), (b2) hill-shade of the image-derived DEM (5 mm pixel size) and (b3) image-derived seafloor roughness from the same seafloor part. The two image-derived roughness surfaces have different univariate distributions and value ranges (Supplementary Fig. S14). The VRM was used as a roughness index. The white scale bar represents 0.5 m on the seafloor

Discussion

The acquisition of high-resolution data with aerial coverage of 37.34 km2, the largest AUV-based dataset presented until now in deep-sea nodule research, provided the unique opportunity to study the spatial distribution of nodules in a geomorphologically complex terrain using a spatial resolution ranging between 2 mm (orthophoto-mosaics) and 2 m pixel size (MBES grids).

The combined use of seafloor images, MBES, SSS data, and different ML algorithms showed that the nodule coverage varied significantly (locally up to 100%) even on a meter scale (Figs. 5, 6). Such variations have been found in other parts of eastern CCFZ and Peru Basin (Wiedicke & Weber, 1996; Gazis et al., 2018; Peukert et al., 2018; Gazis & Greinert, 2021; Alevizos et al., 2022) and are related to the interaction of the seafloor topography, sediment thickness, bottom currents, and the spatial variability of biogeochemical processes (Cochonat et al., 1992; Mewes et al., 2014; Sharma, 2017; Volz et al., 2018; Paul et al., 2019; Hein et al., 2020). This study examined only the geomorphological component of this interplay, providing insights into the role of local topography on nodule occurrences.

Higher nodule coverage and larger nodules were on relatively flat seafloor with gentle (< 3°), eastward-facing slopes (Figs. 5, 6, 11). Both variable importance measures (Fig. 8) and PDPs (Fig. 11) showed an increased importance of eastward-facing slopes, which points toward a predominance of westward-moving bottom currents. Long-term observations in the B4 domain of the GSR contract area show that the mean bottom current direction is northwestward (Hayes, 1979; Juan et al., 2018; GSR, 2018). The predominant mean bottom flow toward NW is steady with slow velocity (0–3 cm/s) and only periodically interrupted by tidal currents (NW and SE) and sporadic by short-in-time benthic storms (6–15 cm/s), which are heading SE (Hayes, 1979; Juan et al., 2018; GSR, 2018). The NW steady and slow bottom currents contribute to the creation of favorable conditions for the polymetallic nodule growth (Skornyakova & Murdmaa, 1992; von Stackelberg, 1997; Mewes et al., 2014; Kuhn et al., 2017; Juan et al., 2018; Volz et al., 2018): (a) erosion and removal of fine sediments, leaving coarser particles that serve as nodule nuclei behind; and (b) continuous contact between nodule surface and well-oxygenated bottom water that allows the precipitation of hydroxides in polymetallic nodules. Geochemical studies within the eastern CCFZ have shown higher nodule coverage of larger sizes to sites with less clay-size particles and deeper oxygen penetration depths, particularly for sampling sites within the GSR contract area (Mewes et al., 2014; GSR, 2018; Volz et al., 2018). Older studies in the CCFZ showed higher nodule coverage on flat seafloor parts with weak (~3 cm s-1) but steady near-bottom currents (Skornyakova & Murdmaa, 1992; Yamazaki & Sharma, 2000; Kuhn et al., 2017; Juan et al., 2018; Yoo et al., 2018; Alevizos et al., 2022). The importance of slope and slope orientation has also been shown in other ML spatial studies within the CCFZ (Kuhn & Rühlemann, 2021) and the Peru Basin at a nodule field with well-pronounced sediment furrows (Gazis & Greinert, 2021).

Smaller nodule coverage was observed mainly at four sub-environments (Figs. 5, 6): (a) in seafloor parts with typically higher bottom current velocities such as gullies, furrows or the foot of slopes; (b) in topographic highs, with steep slopes such as horsts and volcanic knolls; (c) in the deeper parts of basins around and inside sub-circular depressions; and (d) in areas with basement exposure, rocky outcrops, and carbonate cemented sediment layers.

The PDP of bathymetry showed a bathymetric range of ~ 100 m that favored the nodule presence (Fig. 11). Broader-scale studies in the CCFZ and Peru Basin showed the maximum coverage and abundance occurring within a small bathymetric range (50–100 m) in water depths nearly below the local CCD, resulting in similar to our study response curves (Weber et al. 2000; von Stackelberg, 2000; Sharma, 2017). The PDP of BPI100-300 (Fig. 11) showed a small response to nodule coverage (i.e., few to no polymetallic nodules) in local depressions. Studies have shown that local depressions exhibit geochemical conditions different from their surroundings, having: (a) higher amount of fine-grain sediments (clay), which is trapped in the local depressions during sediment transportation (Wiedicke & Weber, 1996); fine sediments cannot act as nuclei for polymetallic nodule formation due to their size and typically small content of mobilized manganese, which hinders polymetallic nodule formation and growth (Mewes et al., 2014); and (b) increased organic carbon content that shifts the Mn-redox closer to the seafloor, where the limited (see point a) diagenetically mobilized manganese is released into the bottom water without supporting the polymetallic nodule formation and growth (Wiedicke & Weber, 1996; Marchig et al., 2001; Paul et al., 2019).

Similar to local depressions, topographic highs also have limited nodule coverage related to increased slopes, reduced sediment thickness (e.g., shallower or outcropping basement), and local variations in sediment geochemical characteristics (e.g., POC fluxes, TOC contents, and Mn-redox depth) as a result of the fine-sediment winnowing and downslope transportation from the locally intensified or deflected bottom currents (Skornyakova & Murdmaa, 1992; von Stackelberg, 1997; Volz et al., 2018). The studies mentioned above showed that even small height differences of < 10–25 m are associated with such changes, which agrees with our findings.

The construction of seafloor orthophoto-mosaics further enhanced the geomorphological analysis and interpretation of nodule occurrences (Figs. 3, Supplementary Fig. S5). Deep sea orthophoto-mosaicking is an emerging field with several challenges, particularly regarding AUV-based camera systems (Song et al., 2022; She et al., 2023; Song et al., 2024), and only a few studies have used AUV-based orthophoto-mosaics from such depth (but in lower resolution) within nodules fields (Simon-Lledó et al., 2019c; Gausepohl et al., 2020). This study's results showed a great potential for deep-sea orthophoto-mosaicking even when it is based on long AUV transects usually done for fauna or nodule coverage mapping.

Although the geomorphological predictors describe the nodule coverage spatial distribution in an interpretable way linked to natural processes, the hydroacoustic backscatter contributed the most in all ML algorithms. MBES BS and ARA_MF were the most important predictors in all ML models (Fig. 8). Older studies have described the effect of seafloor roughness created by nodules on sound waves and the measured backscatter intensity in MBES and SSS (Weydert, 1991; Scanlon & Masson, 1992; Chunhui et al., 2015; Machida et al., 2019). Here, we showed for the first time this mm-scale seafloor roughness from parts with and without nodule coverage in a quantitative way using the image-derived DEM generated during the orthophoto mosaic processing (Fig. 12). MBES and SSS backscatter captured the nodule coverage variations, being able to discriminate not only sub-environments with and without nodule coverage but also sub-environments with varying degrees of nodule coverage (Fig. 9). MBES backscatter and SSS can be used complementarily, as excluding one results in poorer predictive performance (Fig. 9, Supplementary Table S15). This finding agrees with a previous study in which the combined use of both systems increased interpretation and modeling performance (Janowski et al., 2021).

MBES ARA analyses showed that MBES backscatter acquisition is most meaningful for incident angles within 25–55°, a finding that is consistent with the existing guidelines for backscatter-optimized surveys (Lurton & Lamarche, 2015). The MBES backscatter discriminative ability, even over transitional areas, is attributed to the combination of high frequency (400 kHz), low altitude survey (60 m) and small beam opening angles (0.7° × 0.7°), which resulted in a small footprint size (0.73 m × 0.73 m in nadir on a flat seafloor) and short wavelength (~4 mm), which was in the order or even smaller than the nodules' dimensions (mm to cm scale). MBES ARA derivatives impedance (primary) and volume (secondary) contributed to the model performance in all models except GLM (Fig. 8). This finding agrees with other ML spatial studies, where ARA and its derivatives increased the ML prediction performance (Hasan et al., 2014; Alevizos & Greinert, 2018; Menandro et al., 2022; Misiuk & Brown, 2022; Porskamp et al., 2022). The lower contribution of volume backscattering is attributed to the limited penetration depth of the sound waves at 400 kHz, which is typically < 0.5 m for silt/clay sediments (ϕ > 4.5) and <  < 0.25 m for larger grain sizes (Huff, 2008; Gaida et al., 2018). The main component of volume scattering is expected to be the buried nodules located at a maximum depth of 5–10 cm.

In future studies, it would be interesting to use backscatter mosaics and ARA from multifrequency MBES because the use of higher and lower frequencies (e.g., 100–600 kHz) and pulse lengths (e.g., μs to ms range) has shown to improve ML predictive performance and interpretation (Menandro et al., 2022; Misiuk & Brown, 2022). A holistic modeling approach should also include geophysical-derived data obtained by SBP, such as the thickness and composition of the semi-liquid bottom sediment layer, which is related to nodule coverage (Cochonat et al., 1992; Yoo et al., 2018; Dreisetl, 2016; Lipton et al., 2016; Alevizos et al., 2022). Similarly, the use of geochemical information would contribute to spatial modeling. This approach is used in spatial studies in shallower depths, where hundreds of well-distributed samples exist and can be interpolated using ML algorithms (e.g., Diesing et al., 2017; 2020a; Spiegel et al., 2024). However, in deep-sea environments, geochemical data rely on sparse local observations of cm diameter (e.g., gravity corers), which are typically clustered around areas of particular interest, hindering their use in spatial models where continuous surfaces are needed as input data to generate spatially continuous output data. Studies have shown increased fine-scale spatial variability of geochemical properties related to the spatial distribution and size of deep-sea polymetallic nodules, highlighting the need for intensifying efforts on deep-sea seafloor sampling (Mewes et al., 2014; Volz et al., 2018; Paul et al., 2019).

Regarding the modeling performance, RFM had the best fit, which agrees with other studies showing its ability to outperform other ML algorithms in prediction error (Stephens & Diesing, 2014; Diesing & Stephens, 2015; Herkül et al., 2017; Li et al., 2017; Fernández-Delgado et al., 2019). RFM is currently the "workhorse" in deep-sea predictive mapping (Lawson et al., 2017; Gazis et al., 2018; Diesing, 2020b; Gazis & Greinert, 2021; Uhlenkott et al., 2022; Josso et al., 2023). However, here, it yielded the smallest range in predicted values due to the averaging step between the predicted values from each tree at the last step of the RFM algorithm (Breiman, 2001a). The non-tree-based core of the other four models (i.e., linear, penalized B-splines, 3rd-degree polynomial and sigmoid activation function) allowed for a more substantial value range extrapolation (Fig. 5). NNM had the second-best predictive performance on the testing data but the worst in spatial block CV. Less complex models such as GLM and GAM had better resampling performance in spatial block CV than SVM and NNM, indicating a better generalization ability when a big fraction of the training data is kept out (Fig. 7). This fact is attributed to the smoother functions that are more transferable (and interpretable) over larger areas (Wenger & Olden, 2012; Lauria et al., 2015; Stienessen et al., 2021). In addition, NNM is more sensitive to hyperparameter optimization than GLM, GAM, and RFM (Supplementary Tables S2S9). The same applies to SVM, where the kernel type and other parameters (e.g., polynomial degree and cost function) largely influenced the model performance (Supplementary Tables S6S7), a finding which agrees with previous comparative studies (e.g., Probst et al., 2019). To our knowledge, this is the first time that NNM and SVM were used for a regression task in seafloor habitat mapping, as literature research only provided applications for seafloor classification (Hasan et al., 2012; Diesing & Stephens, 2015; Alevizos & Greinert, 2018; Trzcinska et al., 2020; Janowski et al., 2021; Breyer et al., 2023).

Independent of the model used, the spatial block CV approach could result in a large CV error (pessimistic approach) due to the gaps created in feature space (Roberts et al., 2017; Hao et al., 2020; Wadoux et al., 2021). The block optimization minimized the geographical and feature space distances between training data and the prediction area to a degree but not entirely (Fig. 4). Nevertheless, this approach was preferred over other spatial CV methods such as spatial leave-one-location out (Meyer et al., 2018), Euclidean buffered distances (Hengl et al., 2018), nearest neighbor distance matching (Milà et al., 2022), and spatial covariance weighting (Misiuk & Brown, 2023) because of dataset size and survey layout (Figs. 1, 7). The abovementioned methods have been designed and tested with fewer samples (< < 1000) distributed throughout an entire area but not in linear transects like here. In contrast, spatial blocking is appropriate for large datasets and large regions, being computationally efficient (fewer training points per model iteration) and providing flexibility in partitioning the geographical space (Roberts et al., 2017; Valavi et al., 2019). The model evaluation based on a spatial block CV was done using held-out data from each spatial block separately (Fig. 7). An alternative approach would be to use all held-out data from spatial blocks at once (global R2), which better represents a model's ability to predict over larger areas (Meyer & Pebesma, 2022). However, we preferred the local R2 from each spatial block to get (a) the strictest evaluation for each model (Supplementary Table S14) and (b) the comparison between different spatial blocks, identifying the spatial blocks that model predictions converged or diverged (Fig. 7). The partitioning of the geographical and feature space represents better the error related to extrapolation and transferability although it increases the AOA (Wenger & Olden, 2012; Roberts et al., 2017; Fourcade et al., 2018; Hao et al., 2020; Gazis & Greinert, 2021; Meyer & Pebesma, 2021).

Here, the AOA was large and similar for all models (> 91% of the study area) and enlightened the reasons behind the poor performance of all models in a specific resampling spatial block (Figs. 6, 7). The AOA and SD among model predictions provide valuable information about the models' trustworthiness and should be offered next to their predictions. These two approaches could guide future sampling within sub-areas with high SD and low model applicability. Despite the importance of spatial CV and extrapolation assessment when using ML algorithms, the literature research yielded only few studies that have offered such information aside from ML-based seafloor spatial predictions (Misiuk et al., 2019; Diesing, 2020a; Dolan et al., 2021; Gazis & Greinert, 2021; Spiegel et al., 2024).

It is worth mentioning that a detailed comparison of performance metrics was not the primary goal of this study. Each spatial problem is unique, and each algorithm has several issues to address, such as model accuracy, extrapolation, interpretability, and computational speed. Here, no ML method could outperform the others in all aspects. The multi-model approach provides confidence in the derived nodule coverage distribution because all models predicted similar distributions and variable ranking. The true benefit of a multi-model approach is the increased confidence in areas with good spatial agreement (i.e., low SD) and understanding of the contribution of specific predictors. Complex and simple models perform similarly well in the presence of meaningful predictors and training data that capture the feature space and variability well (Rudin, 2019). As such, we see ML models as derivatives for predictions based on data that have captured the underlying relationships and interactions between the response variable and predictors, but for which our conceptual understanding is still developing (Breiman, 2001b; Shmueli, 2010). Thus, although computational algorithms are used for the prediction, the initial phase of such studies is and needs to be very data-centric, emphasizing acquisition of a large amount of high-quality data at the right locations. A priori knowledge of the right (and technically feasible) sampling locations considering each study area's unique characteristics is difficult, particularly in the absence of legacy data (a typical problem in deep-sea research). Recently, proposed AUV-based future sampling approaches (Shields et al., 2023) and domain knowledge could help achieve a balanced geographical and feature space sample coverage.

Conclusions

Deep-sea mining research is characterized by data scarcity, knowledge gaps, and lack of high-resolution studies (Amon et al., 2022). This work provided clear insights into the spatial distribution of polymetallic nodules within a potential mining field in the Clarion–Clipperton fracture zone, providing the basis for resource estimation, mining path planning, and benthic habitat mapping therein. The multi-model and interpretable analysis combined with angular range analysis and orthophoto-mosaicking can shed light on machine learning models’ ‘black box’ character, giving us valuable feedback regarding each predictor variable's statistical and natural contribution and advancing our knowledge of deep-sea polymetallic nodule occurrences.