1 Introduction

Submarine landslides are a significant natural hazard known to occur widely throughout the ocean seafloor. Historically, the movement of sediment downslope has damaged seabed infrastructure, including the destruction of a Taylor Energy platform in 2004, which caused the release of up to 700 barrels of oil per day for over a decade in the Gulf of Mexico (GoM) and cost approximately $500 million to decommission (Kaiser et al. 2009; Casey 2019). Such damages not only result in costly repairs or replacements but may slow down petroleum transportation to onshore facilities and introduce a potentially devastating marine and coastal environment stressor. These risks pose a threat to existing infrastructure for petroleum production as well as to future infrastructure installations, such as those for carbon storage and wind farms (Offshore Energy 2018). Recently, the impact of large mass movements in the deep ocean that cause tsunamis has been examined due to the significant threat to shoreline communities and economies, such as the devastating 1998 Papua New Guinea tsunami event that caused over 1,600 fatalities (Vanneste et al. 2013; Pampell‐Manis et al. 2016; Sawyer et al. 2019). It is imperative to understand the potential of landslide occurrence in offshore regions to support planning strategies for offshore structure placement, reduce the chance of catastrophic incidents, and protect human, environmental, and economic safety.

The term landslide has become a common label used to represent the types of mass-transport deposits for terrestrial and submarine environments, causing issues in terminology and implications among fields of study related to sediment movement (Shanmugam and Wang 2015). In submarine environments, types of sediment instabilities can be classified into rockfalls, slides or slumps, flows, and turbidity flows (Brunsden and Prior 1984). However, it is difficult to differentiate between these various types of mass movement in the submarine environment. Therefore, this study refers to all types of submarine slope failures as landslides. Landslides are initiated when driving forces exceed the resistance forces of the material composing a surface (Anderson and Anderson 2010). Conditions for crossing the force-balance threshold in submarine landslides may be met by a variety of triggers, including gas migration, wave forcing, and earthquakes, which can inform modelling techniques that assess the probability of a landslide occurrence.

Landslide susceptibility mapping (LSM) is a quantitative method that enables statistical or machine learning (ML) models to calculate the probability of a landslide occurring at a given location based on various factors relating to landslide events, thereby characterizing the spatial patterns of underlying landslide mechanisms (Reichenbach et al. 2018). This LSM framework is static, with the assumption that the environmental conditions at the time of future landslide events for a given area will be similar to the conditions of previous landslide events in that same area. Previous terrestrial studies have used a variety of predictive models to perform LSM. Methods include random forests (Micheletti et al. 2014), generalized additive models (Chen et al. 2017), ensemble decision trees (Sahin 2020), and deep neural networks (Shahri et al. 2019; Wang et al. 2020). These methods are used in conjunction with a Geospatial Information System (GIS) to assimilate a spatially continuous prediction map. While most of the LSM studies have been conducted on terrestrial systems, little is known about the capability of LSM in regards to submarine landslides (Reichenbach et al. 2018) as these methods have seldomly been applied to submarine environments (Shan et al. 2021). Similar submarine slope instability assessments have been completed (e.g., Hitchcock et al. 2010; Collico et al. 2020; Obelcz et al. 2020), however, there is a need to further understand the submarine application of LSM at larger spatial scales.

This paper presents the application of LSM to an offshore region, using the northern GoM as a case study. With currently available spatial data for factors that relate to the occurrence of submarine landslides in the study region, a gradient-boosted decision tree (GBDT) is used for supervised ML to assess the accuracy at which geospatial LSMs can perform in this region, using logistic regression (LR) as a baseline model. Beyond model performance, a feature attribution analysis was performed using SHapely Additive exPlanations (SHAP) to provide insight into what variables are most influential for assessing submarine landslide potential. This LSM application to the northern GoM serves as a first attempt to map landslide potential at a large scale in a remote, offshore region with an assessment of the capabilities using currently available data and how future studies can be improved.

2 Factors relating to submarine landslide potential

Several key factors influence the potential of a submarine landslide occurrence. These factors can be heterogeneous over space and time, and factors in one region may not have the same influence in a different region. Submarine landslide factors that initiate slope failure may be related to those found in terrestrial LSM applications; however, others are more specific to submarine LSM. Categories of factors for submarine LSM can include topographical (McAdoo et al. 2000; Shahri et al. 2019), geological (Cooper and Hart 2002; Martin and Bouma 1982; Tripsanas et al. 2004; Maloney et al. 2020; Masson et al. 2006), geomorphical (McAdoo et al. 2000; Sassen et al. 1999; Milkov & Sassen 2000), and geochemical (Maloney et al. 2020; Feseker et al. 2014; Cooper and Hart 2002). Together, these factors act as a proxy for submarine conditions susceptible to landslide initiation and can be used to perform submarine LSM. A full description of these factors is reported in Online Resource 1.

3 Case study: Gulf of Mexico

Since the 1950s, submarine landslides in the northern GoM have been studied with an interest in protecting offshore structures for petroleum production (i.e., Shepard 1955; Coleman et al. 1978). With offshore energy infrastructure initially placed in shallow, nearshore waters off the GoM coast, early submarine landslide studies focused on the Mississippi River Delta Front (MRDF; i.e., Coleman et al. 1978, 1980). Since then, offshore projects have explored deeper waters to access additional resources with deepwater (water depth > 1,000 feet) and ultra-deepwater (water depth > 5,000 feet) activities (Bureau of Ocean and Energy Management 2008). With ongoing dependence upon marine infrastructure to support energy production, reliance on marine economies, and the likely advent of carbon storage projects, there is a need to map the potential of submarine landslides in remote areas of the seafloor. Currently, no basin-scale LSM applications for the northern GoM have been published. This is likely due to limited studies and data availability. However, with recent advances in sensor and seismic technology, open-sourced high-resolution bathymetry and spatial seafloor hazard data have become available, and there is an opportunity to perform LSM for offshore regions.

3.1 Study area

The case study area includes a portion of the northern GoM, parts of which are hot spots for petroleum production in the United States (U.S.) Exclusive Economic Zone (EEZ). The specific boundary of the study area is the U.S. EEZ in the GoM where the water depth is greater than 120 m (Fig. 1). This region extends out to 200 nautical miles offshore and covers a total area of 386,753 km2. Notable regions within the study area include the Texas (TX)-Louisiana (LA) Slope, the Sigsbee Escarpment, the Mississippi Canyon, the De Soto Canyon, and the Florida Escarpment. The GoM basin is the result of an extinct extensional regime that split the previously emplaced Louann salt sheet into the northern GoM salt basin and the Campeche salt basin (Galloway 2008). The Louann salt sheet forms the continental slope of the northern GoM before the Sigsbee Escarpment descends into the abyssal plain. Recent (late Neogene-present) sedimentation rates are greatest along the central GoM coast margin, with the heaviest sedimentation supplied by the Atchafalaya and Mississippi Rivers. The resulting progradation supplies massive amounts of sediments to the GoM basin that are concentrated in the Mississippi Canyon and then redistributed through geomorphologic processes along the continental shelf and onto the abyssal plain. Due to these high sedimentation rates, the seabed sustains a significant load that affects its stability. Further, the Louann salt sheet deforms and migrates continuously due to heavy clastic sedimentation rates that were initiated in the Late Jurassic-Early Cretaceous and continue to load the salt sheet and drive gravity tectonics in modern times (Galloway 2008). Salt diapirism is responsible for the many salt-withdrawal mini-basins that introduce high variability to the bathymetry, the differential accumulations of sediment, and much of the subsurface to seafloor structural complexity (i.e., faults and fractures). Dense structural complexity indicates a greater potential for fluid migration that utilizes the high permeability provided by fractures to relieve fluid or gas in pressurized subsurface reservoirs. Where these fluid migration pathways breach the seafloor is indicated by seeps, mud volcanoes, hydrates, and chemosynthetic communities.

Fig. 1
figure 1

Map of the study area in the northern GoM outlined in black. The regions used for training ML models are outlined in red with labels A, B, C, and D

Four regions where historic landslides have been digitized were used for analysis (Dyer et al. 2022). Each region is labeled A-D from west to east across the study area, as shown in Fig. 1. Region A has an area of 8,508 km2 and is located within the TX-LA Slope salt basin. Region B has an area of 3,575 km2 and is also located within the TX-LA Slope salt basin. Region C has an area of 17,577 km2 and primarily covers a portion of the northern GoM salt basin and the western flank of the Mississippi Canyon. Region D has an area of 5,389 km2 and is located southwest of the De Soto Canyon.

3.2 Materials and methods

3.2.1 GIS feature database

A spatial database was curated containing 20 features relating to factors that are known to affect submarine landslide potential in the northern GoM. Topographic features include elevation, aspect, slope, and curvature (IOC et al. 2003). Geomorphology features include basins, canyons, and escarpments (Harris et al. 2014) as well as channels (Bureau of Ocean & Energy Management 2016a). Geological factors include faults (Diegel et al. 1995; United States Geological Survey 2004a), salt diapirism (United States Geological Survey 2004b), sediment thickness (Twichell et al. 1995), and sediment type (Buczkowski et al. 2020). Lastly, geochemical factors include gas presence, pockmarks, mud volcanoes, and seeps (Bureau of Ocean & Energy Management 2016a) as well as hydrates (Twichell et al. 1996; Majumdar et al. 2017). All spatial features that are represented as points, lines, or polygons were converted to a continuous surface by using the Euclidean Distance tool provided by ArcGIS Pro (Environmental Systems Research Institute 2022). Derivatives of elevation including aspect, slope, and curvature were created using the Surface Parameters tool in ArcGIS with a 3,000-m search radius (Environmental Systems Research Institute 2022). The sediment type data acquired from the usSEABED database (Schweitzer et al. 2020) was divided into features depicting the percent coverage of sand, mud, gravel, and rock. All rasters are sampled to a 500-m spatial resolution. Source information and maps of each feature can be found in Table S1 and Fig. S1, respectively (Online Resource 1).

It is important to note that even though ocean surface waves can trigger landslide events in water depths less than approximately 120 m (Henkel 1970; Maloney et al. 2020), this factor is not included in this study. Therefore, analysis for this case study was limited to water depths greater than 120 m in the U.S. EEZ. Furthermore, a visual inspection of the regional sediment accumulation rate predicted by Restreppo et al. (2020) in Fig. S2 (Online Resource 1) shows that there is little variation in the sediment accumulation rate for the study region. For that reason, sediment accumulation rate is not included in this case study.

The historic submarine landslide inventory utilized in this study as observational data was curated by Dyer et al. (2022). This dataset represents boundaries of where sediment volume was lost (also referred to as the depletion area) of historic submarine landslides in four regions within the study area (Fig. 2), offering a dataset with minimal false negatives (i.e., non-identified landslides). These landslide observations were added to the GIS database by rasterizing to the same grid as the input features. Region 1 had 34,049 observations with 1,674 (4.92%) of them having a positive landslide class. Region 2 had 14,161 observations with 892 (6.3%) of them having a positive landslide class. Region 3 had 70,301 observations with 4,342 (6.18%) of them having a positive landslide class. Lastly, region 4 had 21,527 observations with 714 (3.32%) of them having a positive landslide class.

Fig. 2
figure 2

Map of the landslide observations for training–testing regions A, B, C, and D within the study area. A bathymetric hillshade provided by BOEM (2016b) is displayed in the zoomed-in maps to provide visual context of the seafloor terrain. Landslide pixels outside of the training–testing regions are not included in ML modelling

3.2.2 Mutual information

To assess the statistical relationships between the landslide and non-landslide groups, Mutual Information (MI) is performed on each input feature with the landslide class as the target variable. MI is a grounded concept in information theory that is used to estimate the dependency or relatedness between two discrete groups (Ross 2014). Here, we measured the information content between two continuous groups (landslide and non-landslide) by performing an entropy estimation using a nearest-neighbors method designed by Kraskov et al. (2004) and Ross (2014). By calculating the MI of each input feature between the landslide and non-landslide classes, the value that each feature may have in discerning between the two classes was estimated. MI values closer to zero indicate that the feature is independent of the two landslide classes, and values closer to one indicate that the feature has a high dependency between the two classes. Therefore, features with higher MI values will likely provide more valuable information for classification modelling. This functionality is provided by the scikit-learn package (Pedregosa et al. 2011).

3.2.3 Collinearity

A Pearson correlation matrix was created between all the input variables to assess the amount of collinearity between the input features. Values range from −1 to 1. Here, it was assumed that any two variables with a correlation equal to or greater than 0.8 as well as equal to or lower than -0.8 are highly correlated and it is unwise to include both features in the ML models. Between two highly correlated features, the feature with the lowest MI score was dropped from the analysis.

3.2.4 Frequency analysis

Frequency analysis was completed for each of the remaining features to provide statistical results that may indicate higher or lower landslide susceptible groups of feature values. For each feature, the continuous values were split up into a set of predefined bins. For each bin, the total and percent of pixel values within that range bin were given, as well as for the total and percent of positive landslide classes that fall within that range bin. The frequency analysis is provided in Table S2 (Online Resource 1).

3.2.5 Predictive modelling

This study utilized a GBDT model to predict the binary target (landslide or non-landslide) and forecast landslide susceptibility. This type of ML model is an ensemble method that combines several weak prediction models (decision trees) to create a highly accurate model, known as boosting. More specifically, gradient boosting methods minimize the loss function for each base learner sequentially, ensuring that subsequent learners are always more effective. As a result, this boosting method reduces overfitting and improves overall performance (Friedman 2002). Also, GBDTs are robust to outliers and can model non-linear feature interactions (Friedman 2002; Elith et al. 2008). With these defining characteristics, GBDTs have gained wide acceptance for both regression and classification applications, and these models have been shown to perform with high accuracy in LSM studies (Song et al. 2018; Sahin 2020). Here, the eXtreme Gradient Boosting (XGBoost) algorithm was utilized as the GBDT model, provided by T. Chen and Guestrin (2016). XGBoost is a boosting algorithm that can handle large datasets and provides a parallelization feature to improve computational speed (T. Chen and Guestrin 2016). Additionally, LR has been utilized for LSM (Raja et al. 2017) and is reported as a baseline using the scikit-learn package (Pedregosa et al. 2011).

To assess the predictive accuracy of a classifier and its ability to distinguish between landslide and non-landslide observations, a permutation method was used. This approach performs training and testing on three ratios of training–testing sets: 1:1 (one training region, one testing region), 2:1 (two training regions, one testing region), and 3:1 (three training regions, one testing region). For each ratio, all permutations of the possible arrangements for training and testing sets using the four training–testing regions were completed. This approach measures how the number of unique training regions affects accuracy and offers an assessment of all four training–testing regions.

The following ML workflow (Fig. S3, Online Resource 1) was performed on each permutation of training and testing sets. A randomized search method was used to tune several parameters of each model (Fig. S3, Fig. S4, Online Resource 1), which executes 10 parameter permutations and evaluates model performance using stratified k-fold cross-validation (CV) on the training set. With the tuned parameters, an optimized model pipeline was fitted to the training set. For the GBDT model, the testing set was used as early stopping criteria during the optimized model phase with training ceased once the Area Under the Receiver Operating Characteristic Curve (ROC AUC, but hereby referenced as AUC) score did not increase after 10 training iterations (epochs). Early stopping can be a critical part of GBDT models due to the potential of overfitting to the training set.

Prior to model fitting throughout the ML workflow, the training and testing sets were sent through a data processing pipeline. First, all input features were scaled from 0 to 1. Next, all missing values were filled using k-nearest neighbor (KNN) imputation with five neighbors (Cover and Hart 1967; Triguero et al. 2019). Lastly, due to an imbalance in the two target classes, under-sampling was performed on the training set to reduce the number of non-landslide observations until equal to the number of landslide observations.

Model performance was evaluated with the testing set of each permutation using standard metrics for a binary classification problem including accuracy, precision, recall, and AUC. An accuracy score measures the proportion of correctly classified observations. A precision score measures the proportion of positive predictions (landslide) that were positive. A recall score measures the proportion of the positive observations that were correctly classified. The AUC score compares the true positive rate (TPR) and the false positive rate (FPR) at various cut-off thresholds and measures the ability of a classifier to distinguish between two classes. For all metrics, values range from 0 to 1, where lower values indicate poor performance and higher values indicate good performance. Additionally, Receiver operating characteristic (ROC) curves were provided for the training and testing sets. ROC curves compare the FPR to the TPR and illustrate the performance of a binary classifier as the cut-off threshold is adjusted.

3.2.6 Feature attribution

SHapely Additive exPlanations (SHAP) (Lundberg and Lee 2017) was used to assess the importance that each feature has in landslide classification. The SHAP method is an application of game theory and can be used to estimate Shapely values (Shapley 1997) which provides consistent and accurate feature attribution values. TreeSHAP, a SHAP method designed for tree-based models, was utilized to estimate SHAP values and enables fast computation speeds (Lundberg et al. 2018). Here, SHAP values reported are the absolute weighted average of Shapely values for each tree in a GBDT model and represent the contribution that a feature has in determining the outcome (prediction) of the model. SHAP values are estimated using the testing set for each permutation, allowing for the assessment of the distribution of feature attribution over the four training–testing regions.

3.2.7 Landslide susceptibility mapping

A landslide susceptibility map was produced for the entire study area to visualize the spatial patterns of submarine landslide susceptibility. The final map was created by training each model with the input data from all four training–testing regions and then predicting the probability of the landslide class from 0 to 1 for each pixel over the study area. Since the landslide observations from the dataset by Dyer et al. (2022) represent a variety of types, including slumps and flows, the predictions represent the potential for the mass transport of sediment that is inclusive of all types of transport mechanisms. Additionally, because the landslide boundaries represent the depletion areas, the landslide susceptibility maps will be applicable to predicting potential areas for landslide initiation and source sediment. The Jenks Natural Breaks Classification method was used to classify the landslide susceptibility prediction into risk classes representing very low, low, medium, high, and very high (Jenks 1967), which has been utilized to visually represent the degree of landslide risk (Chen et al. 2017; Song et al. 2018; Shahri et al. 2019; Wang et al. 2020). A cumulative density function (CDF) plot was additionally supplied for each map to provide context for the distribution of values within each risk class.

4 Results

4.1 Mutual information

The MI results representing the ability of a feature to discern between the landslide and non-landslide groups, and therefore a stronger predictor of landslide occurrence, is shown in Table 1. Sediment types of mud, sand, gravel, and rock showed the highest dependency between the two groups. Among the topographical features, elevation, curvature, and aspect had the lowest MI scores, while slope showed a higher amount of dependency. Among the features with high MI scores included the geomorphological factors: basins, canyons, channels, and escarpments. All the geochemical features received relatively low MI scores.

Table 1 Mutual information results for each future

4.2 Collinearity

The highest correlation at −0.8 occurred between two sediment types: mud and gravel. Of the two features, mud had the higher MI value, and therefore gravel was dropped from further analysis. Other considerable high collinearities occurred between mud volcanoes and canyons with a correlation of 0.79, as well as hydrates and canyons with a correlation of 0.71. However, these values did not meet or succeed ± 0.8 and were therefore retained for analysis. The full Pearson correlation matrix can be found in Fig. S4 (Online Resource 1).

4.3 Model evaluation

The ability of the LR and GBDT models to classify landslide presence given the remaining 19 input features was assessed using the permutation method that performs on all combinations of three training–testing set ratios (1:1, 2:1, and 3:1). The 1:1 ratio had 12 combinations with the number of training observations ranging from 14,161 to 70,301. The 2:1 ratio also had 12 combinations with the number of training observations ranging from 35,688 to 104,350. The 3:1 ratio had 4 possible combinations and the number of training observations ranged from 69,737 to 125,877. The averaged results for all evaluation metrics over each training–testing ratio are reported in Table S5 (Online Resource 1).

A comparison of the LR and GBDT models' evaluation metrics scores on the testing set during the permutation method is shown in Fig. 3, which offers a visual to discern between performance differences based on the individual models as well as the number of training regions. The median accuracy score for both models shows a decreasing trend as the training set size increases but shows an increasing trend for precision, recall, and AUC. Precision scores were low (< 0.5), indicating that the models tend to predict an observation as positive (landslide) when it is negative (non-landslide). Recall scores had high variability but did increase on average as more regions were used for training. Both the LR and GBDT models seldom have AUC scores under 0.50, indicating that the models generally perform better than random guessing. The GBDT model in the 3:1 permutation achieved the highest AUC score compared to all other permutations with an average score of 0.81. Overall, the GBDT model outperformed LR, with median AUC values 29.6%, 14.0%, and 7.2% higher for the GBDT model in the 1:1, 2:1, and 3:1 training–testing ratio groups, respectively.

Fig. 3
figure 3

Boxplots of evaluation metric scores for each training–testing ratio (1:1, 2:1, and 3:1) using the GBDT (green) and LR (blue) models when predicting on the testing set. The median value is shown by the horizonal, black line. Black outlined circles show outliers

The ROC curves (Fig. 4) illustrate that the models generally performed better with the addition of training data from various regions in the study area. Model overfitting, which is the difference between the training and testing set metrics, decreased sequentially in training–testing ratio groups along with increased average AUC. Overfitting was especially minimal for the GBDT model. Therefore, these findings indicate an instability of model performance when training on a singular region of the study area with variable metrics on unseen data.

Fig. 4
figure 4

Receiver operating characteristic (ROC) curves for the a) LR and b) GBDT models showing grouped results for the training (purple) and testing (green) sets of each training–testing ratio (1:1, 2:1, and 3:1). Individual model runs are displayed as faded lines and the average of each run by model is displayed as a bolded line. The black dashed diagonal line represents a random classifier line where the TPR is equal to the FPR

4.4 Landslide susceptibility predictors

Figure 5 shows the distribution of mean SHAP values over each permutation within the 1:1, 2:1, and 3:1 training–testing ratio groups. Here, feature attribution is estimated using TreeSHAP (Lundberg et al. 2018) with the mean absolute SHAP value representing the contribution that a feature has in determining the model outcome.

Fig. 5
figure 5

Feature importance box plots for each training–testing ratio (1:1, 2:1, and 3:1) for each of the nineteen remaining input features. The feature importance is determined using SHAP, and the median SHAP value is shown with an orange vertical line. Black outlined circles show outliers

The ranking of top predictor features varied in each training–testing permutation group, but among the top 5 predictors for each group were features related to topography, geomorphology, gas migration, and sediment characteristics. Only 7 features had a median SHAP value greater than zero in the 3:1 permutation group, including slope, seeps, sediment type—rock, pockmarks, gas, faults, and escarpments. This coincides with a smaller distribution of AUC scores for the 3:1 group, suggesting that the GBDT model can perform more consistently with additional training regions and a minimal number of features.

Slope had the highest median contribution compared to all the other features for the three training–testing groups. Based on the MI results, it would be expected that the topographic features, aspect, curvature, and elevation, have minimal influence on landslide classification, and this holds true in the SHAP results with the exception of curvature and elevation obtaining considerably high SHAP values in the 1:1 and 2:1 ratio groups.

The importance of each geomorphological feature fluctuated depending on the training region(s) because the type of seafloor geomorphologies within each region varies. All the geomorphology factors, basins, canyons, channels, and escarpments, have a high influence on landslide classification in the 1:1 and 2:1 ratio groups, but only the distance to escarpments feature shows a high level of importance in the 3:1 ratio group.

Of the three features used in modelling relating to the percent of sediment type classified as mud, rock, and sand, only rock showed to have a major influence on landslide classification. The percentage of mud and sand coverage on the seafloor had a minimal contribution to the GBDT model based on SHAP. Other geological features, faults and sediment thickness, had a positive influence on model performance with the exception of salt diapirs which had very low SHAP values over all three permutation groups.

Among the geochemical features, gas, mud volcanoes, and seeps provided positive contributions to the GBDT model predictions. These features can indicate the presence of gas migration, which can be a trigger for slope failure initiation. It can be expected that lower distances to these seafloor geohazards will increase the probability of a submarine landslide occurrence. However, distance to pockmarks and hydrates had low importance scores over each permutation. The low importance of the pockmarks feature may indicate that gas migration is a stronger predictor for landslide susceptibility than fluid migration. Data collected for hydrates (Twichell et al. 1996; Majumdar et al. 2017) has a minimal coverage over the study region and is shown in Fig. S1 (Online Resource 1), so the information available to the model on hydrate location relative to landslide observations may not allow for a strong relationship to be identified.

4.5 Landslide susceptibility mapping

Figure 6 shows landslide susceptibility maps for the full study area predicted by the LR and GBDT models, using the Jenks Natural Breaks Classification method to classify landslide susceptibility risk into five classes of very low, low, medium, high, and very high (Jenks 1967).

Fig. 6
figure 6

Landslide susceptibility maps for the full study region with predictions using a the LR model and b the GBDT model. Landslide probabilities for each map are classified using the Jenks classification method into very low, low, medium, high, and very high landslide risk classes. A CDF plot is supplied for each landslide susceptibility map along with the training–testing regions displayed in red

Similar landslide susceptibility patterns are observed between the LR and GBDT models, but slight differences can be identified. The LR susceptibility map only classified the northern portion of the Florida escarpment as having a very high landslide potential, whereas the GBDT susceptibility map depicts the entire escarpment area as being at a very high risk. Based on the CDF plots, the LR model predicted a slightly higher percentage of the area classified in the high and very high-risk bins compared to the GBDT model. Furthermore, based on a visual assessment, the GBDT model was more capable of distinguishing between high and low landslide risk classes at smaller scales.

The CDFs provided in Fig. 6 show that more than 80% of the study area falls into a low or very low landslide risk, which is visible in the Mississippi Canyon, Mississippi Fan, and Florida Shelf. We expect high sediment movement in Mississippi Canyon, but sediment migration does not qualify under the submarine landslide definition outlined for this study. Other minimally susceptible areas tend to occur between salt basins on the TX-LA slope. In contrast, a small percentage of the total area (~ 15%) was predicted to have a high or very high landslide susceptibility. Based on the visualization of landslide potential in Fig. 6, a majority of the high and very high landslide risk classes occur in the TX-LA slope, Sigsbee Escarpment, and Florida Escarpment at the locations where the slope is the steepest (Fig. S5, Online Resource 1).

It should be acknowledged that the environments of the Florida Escarpment and particularly the Florida Shelf are geologically unique within the study area, due to the carbonate platform that composes the continental shelf in the eastern GoM. The continental shelf along the northwestern and north-central GoM is characterized by heavy clastic sedimentation. Further offshore in deepwater regions, salt withdrawal mini-basins that result from salt diapirism are consequently filled with clastic sedimentary deposits (Galloway 2008). A key process differentiating the morphology of the Florida Escarpment from other regions in the GoM is erosion via dissolution and cliff collapse, whereas in areas of clastic sedimentation, erosion is generally instigated solely through mechanical processes rather than chemical and mechanical processes. With the training-testing regions occurring mainly within the northern GoM salt mini-basin region and the DeSoto Canyon, the LR and GBDT models are extrapolating to the unseen Florida region in the landslide susceptibility maps. This may introduce inaccurate results, as a limitation of these types of ML models is the inability to extrapolate to new data.

5 Discussion

Given the availability of high-resolution bathymetry and geohazard datasets, a spatial database of topographical, geomorphological, geological, and geochemical seafloor characteristics and historic landslide areas (Dyer et al. 2022) was created that integrates many known triggers and conducive conditions of submarine landslides. These data make the northern GoM a viable region for an adapted offshore LSM application; thus, a case study was conducted to evaluate the potential of submarine landslide events in the deepwater northern GoM. Favorable results from the ML approach can be attributed to the comprehensive literature review that identified the appropriate submarine landslide factors for the study region, allowing for accurate LSM to be performed. Further context is provided into successes as well as limiting factors worth further discussion.

A LR model was utilized as a baseline for the predictive models, and, as expected, the GBDT model outperformed the LR model with higher median AUC scores for each of the three training–testing ratio groups. This enhanced landslide classification may be due to the ability of GBDTs to model non-linear interactions between input features and reduce overfitting (Friedman 2002; Elith et al. 2008). While the GBDT model showed consistent AUC scores when classifying between historic landslide scars and undisturbed areas over each permutation of training–testing ratio groups, overfitting of the model varied with minimal overfitting when training on three different regions in the study area. This can be expected when performing LSM with supervised ML, as ML models are complex algorithms, and changing model parameters and/or training data can lead to varying results (Goetz et al. 2015). These results indicate that LSM model predictions will be more stable and accurate with training data from a variety of environmental conditions.

The overall performance results were satisfactory when compared to other LSM studies using boosting algorithms; however, some studies have achieved higher AUC scores at 0.93 (Micheletti et al. 2014) and 0.98 (Song et al. 2018). Precision and recall were relatively low for our models, indicating a reasonable number of false positives and false negatives in model predictions, respectively. However, since LSM models should attempt to minimize incorrect detections of a landslide (i.e., false negatives), it should be attempted to optimize model sensitivity (i.e., recall). This can be achieved by decreasing the cut-off threshold for the probability of a positive landslide prediction. Thereby, while overall performance may not be exceptional, a suitable cut-off threshold could provide a LSM model that is optimized to correctly identify locations with a high landslide susceptibility. Furthermore, the performance of the models in this case study is likely limited by the data, as ML model performance is limited by the quality and quantity of data supplied (Fabbri et al. 2003; Raja et al. 2017). This study was possible with the emergence of submarine geophysical survey technology capable of accurately mapping seafloor geohazards that are related to landslide initiation. While it is assumed in this study that all geohazards are correctly mapped in the geohazards dataset (Bureau of Ocean & Energy Management 2016a), it is possible that the geohazards in the GOM are not fully accounted for. Therefore, the results of this analysis are reliant on the accuracy and precision of these geohazard mapping technologies and the resolution of data to support them.

The 19 input features were shown to have varying levels of importance in modelling landslide potential dependent on the regions used in each training–testing permutation. The influence that each feature has on modelling landslide potential is location-specific, as the spatial distribution of certain geohazards varies. It is our understanding that the entire study region was mapped for geological formations (i.e., pockmarks, basins, etc.), so locations with minimal geological hazards would be considered to have a low landslide potential in the absence of high-risk topographic features. Slope is a major influencing feature with most steep slopes having a high or very high landslide risk classification. Results by McAdoo et al. (2000) suggest that in the MRDF region of the GoM slide events are more prone to occur on shallower slopes that are unconsolidated, however, the produced landslide susceptibility maps for this study region, which does not overlap with the MRDF region (Fig. 6), illustrate that areas with a higher slope are more susceptible to landslides. Other notable features with a large influence on landslide susceptibility in the study region include features related to subsurface gas migration, sediment type or lithology, and high-sloped geomorphologies. Results found by McAdoo et al. (2000) in the deepwater GoM identified major slope failures occurring at high-sloped geomorphologies between salt withdrawal basins as well as along the Sigsbee Escarpment, a steep and significant bathymetric feature formed at the southern termination of the Louann salt sheet, which confirm the importance of the basins and escarpment features in permutation subsets where those seafloor geomorphologies are present. Furthermore, examining the presence of rock versus the sediment types in these analyses, sand and mud, provides additional insights. The usSEABED dataset (Schweitzer et al. 2020) reports that the rock classification conveys both loose rock, which is coarser than cobble (-8 phi), and bedrock. If the majority of the substrate contains loose rock, based on rheologic principles it would be expected to have a lower shear strength, while a higher shear strength would be expected for bedrock. This combined classification of loose rock and bedrock therefore conveys a large range of rheologic properties to the models, which may account for the significant interquartile range shown in Fig. 6. According to the frequency analysis (Table S2, Online Resource 1), 78% of the study area has a low percentage (0–25%) of rock in the substrate and only 14% has a high percentage (75%-100%). An examination of core data (Lamont-Doherty Core Repository 1977) from the study regions where usSEABED (Buczkowski et al. 2020) indicates a loose rock-bedrock composition suggests that at least the approximately upper 1 m of substrate is dominated by lutite, or fine-grained clays and muds to claystone and mudstone. Based on these results, it is plausible that sediment which is overlaid on the loose rock-bedrock classified areas may be more prone to slope instability because these overlain, fine-grained sediments are less competent than bedrock.

The models were successful, showing that it is possible to use ML to accurately forecast submarine landslide susceptibility. However, there are limitations that need to be addressed to improve model performance when applied to landslide susceptibility mapping. This study concluded from the feature attribution results that slope is one of the most influential features for delineating landslides in the study area. A map comparison of the slope feature and GBDT landslide susceptibility prediction over the study region in Fig. S5 (Online Resource 1) illustrates that the very high landslide susceptibility class correlates spatially with steep sloped areas. This agrees with results by McAdoo et al. (2000) that found that the slope grade of the resulting landslide scarp (the steep section of undisturbed material at the upper edge of the landslide area, left behind by the movement of displaced material) is anomalous relative to the adjacent area. Therefore, it is expected that the slope feature is strongly correlated to landslide observations. This makes the slope feature useful for submarine landslide identification; however, it may be misleading for assessing susceptibility. Additional environmental information is necessary to distinguish the landslide potential for areas with similar slope values, and a ML model is needed that can identify these multi-related interactions on the seafloor. Furthermore, there is a lack of knowledge regarding the timing and frequency of submarine landslides due to a lack of in situ measurements of these events and uncertainties in dating methods such as radiocarbon dating, oxygen isotope curves, and tephrochronology (Huhn et al. 2019). Without an understanding of the temporal characteristics of the submarine landslide occurrences and temporal variation in their triggers for this study area, the landslide susceptibility maps can be used to portray where submarine landslides are likely to occur spatially but cannot be used to conclude when these events will occur (Chacón et al. 2006; Reichenbach et al. 2018). Therefore, we acknowledge this temporal uncertainty and variation to be a limitation of this LSM application, in that each LSM produced with this method provides predictions based on the moment data were collected and do not reflect any changes to the bathymetry or conditions thereafter.

6 Conclusion

Submarine landslides pose a significant threat to current and future offshore infrastructure; however, LSM has seldom been applied to offshore regions to spatially forecast the potential of submarine landslides in economically prudent offshore areas. With the availability of region-wide data that are related to submarine landslide occurrences, the GoM was used as a case study for applying LSM techniques to a basin-scale region in the offshore environment. While many methods for producing landslide susceptibility maps exist, this study employed a GBDT model using a permutation approach to assess how the location and amount of training data influence model accuracy. Variability in model accuracy indicated that the GBDT model provides a more accurate model overall compared to LR and that the LSM model performance improves when training on several locations that are geologically unique.

The importance that each feature has in forecasting landslide potential is location specific; however, general features necessary for LSM in the GoM case study region can be made. Based on feature importance metrics, geomorphological settings, including basins, canyons, and escarpments, were shown to provide valuable information to forecast landslide susceptibility. Other notable influencing factors include the percentage of rock coverage, faults, and various types of gas presence. Other factors that were not considered in this study were wave height and sediment accumulation rate, which may be important in shallow environment LSM (less than approximately 120 m of water depth where extreme waves are still able to exert pressure variations at the bottom).

The results of this case study offer an initial understanding of how available offshore spatial datasets can be utilized to develop landslide susceptibility maps. By means of characterizing spatial patterns of high and low landslide potential areas, LSM can aid in the current and future planning of offshore infrastructure to help prevent and mitigate potential incidents. Future LSM studies may be influenced by climate change effects such as sea level rise and sedimentation rate as these factors become more conclusive (Urlaub et al. 2013). While this study reports the success of LSM in the northern GoM at a large spatial scale using a GBDT model, the results should be utilized as a baseline for future model improvements and extrapolated to other offshore regions.