Introduction

Numerous research projects have highlighted the undeniable increase in the frequency of natural disasters worldwide, predominantly caused by changes in climate (Pei et al. 2023; Zhang et al. 2023). Therefore, protection against natural disasters has become an absolute priority, and forecasting and prevention measures are still among the steps that need to be taken to move towards good planning (Bashir et al. 2024a, b; Yousefi et al. 2020). Creating thematic data to map catastrophe susceptibility is still among the most crucial methods for foreseeing and managing natural risks (Khan et al. 2020; Youssef et al. 2023). These maps are essential in natural risk management and developing risk reduction strategies. They make it possible to visualize and classify at-risk areas, identify vulnerable populations and infrastructures, and better understand the risk factors specific to each region.

However, most current research into natural hazards is limited to the analysis of individual risks (Bammou et al. 2024b; Razavi-Termeh et al. 2023). These studies, which focus on particular threats, generally consider risks as independent phenomena without considering the combined degree of vulnerability of relationships between several risks (Panahi et al. 2020; Yousefi et al. 2020). Thus, assessing and looking into the interactions between different risks is fundamental. Various hazard investigations are crucial, as they allow us to find far more significant concentrations of harm and risk than studies focusing on a single type of hazard (Hillier et al. 2020). For this kind of research, the geographical distribution of natural hazards in the area has to be reevaluated. This strategy aids in lowering the likelihood of occurrence and the possibility of cumulative risks brought on by the interplay of several hazards, such as landslides, financial losses, and fatalities (Bammou et al. 2024d; Sestras et al. 2023; Stalhandske et al. 2024). By mapping vulnerability holistically and considering the complex interactions between natural hazards, it is possible to predict, prevent, and proactively manage threats to communities and the environment.

Multi-hazard vulnerability mapping is a method that aims to consider multiple types of threats simultaneously at each location. It makes it possible to assess the complex spatial relationships between different high-risk events, including the possibility for these events to occur concurrently or cumulatively, as well as any potential interactions between them (Aksha et al. 2020; Ming et al. 2022). According to Bammou et al. (2024a, b, c, d), the Tensift catchment consists of well-correlated risk zones with certain common conditioning factors, such as increased water erosion risk from runoff-induced flooding and landslides from heavy rainfall and flooding. Multi-hazard mapping is a powerful tool for understanding better and managing the complex risks that threaten communities (Lombardo et al. 2020; Saunders and Kilvington 2016).

Various techniques have been used to model multiple hazards, such as using two decision aids, namely the sequential Monte Carlo technique and a decision aid (Guo et al. 2020; Zhao et al. 2024). Another method combines empirical data with deterministic, equation-based (i.e., theoretical) discoveries (Bout et al. 2018; Mondini et al. 2023). Furthermore, a method integrating multi-criteria analysis and geographic information systems (GIS) has been applied to produce successful outcomes, as demonstrated by Lyu and Yin (2023), Sestras et al. (2023). Recently, a lot of research has employed the integration of machine learning methods, including Boosted Regression Tree (BRT), Generalized Additive Model (GAM), Random Forest (RF), and Support Vector Machine (SVM) for risk prediction (Bammou et al. 2024a; Kohansarbaz et al. 2022; Ye et al. 2020).

The development of a multi-hazard map using ML models is based on the study of the leading natural hazards in the study area, namely gullying, landslides, and flooding (Bammou et al. 2023). The production of such a map is essential for a good understanding of the association of these risks. This study, therefore, compared the effectiveness and accuracy of different machine learning models, such as SVM, RF, ANN, KNN, DT and XGBoost, in producing risk maps for the Tensift catchment and the Haouz plain.

This would be a pioneering study in the scientific literature that combines three natural hazards: landslide, gully erosion and flood risks together to construct. This work uses state-of-the-art ML models to develop a novel multi-hazard assessment strategy to understand the interrelationships and assess the dangers of landslides, gully erosion, and floods. This study aims to answer the research questions: Why is a multi-hazard assessment necessary rather than a single-hazard approach, and what are the benefits? The findings of this research are helpful for researchers, authorities, developers, and decision-makers involved in land management and risk mitigation strategies in the Moroccan High Atlas and regions across the globe with similar topographical, geological, and meteorological conditions.

Material and methods

Study area

The High Atlas in Marrakech is formed by three primary geological formations, according to Duclaux (2005): (1) the Permo-Triassic is the most common formation in the east. The highest peaks of the Atlas are located in the central region, home to (2) Precambrian igneous and metamorphic rocks. The western area is home to (3) primary and secondary limestone formations, most of which have limited permeability, continuous surface runoff, and the potential to develop significant runoff after heavy rainfall. It often occurs in conjunction with Ordovician and Precambrian shales.

The central Moroccan region around Marrakesh is home to the Tensift watershed. Its 20,000 km2 surface consists mainly of two zones that behave differently hydrologically as illustrated by Fig. 1. With an elevation gain of over 4000 m, the Atlas Mountains' southern slopes receive a substantial amount of precipitation and snowfall (up to 600 mm/year) in the catchment region. These mountains act as a "water tower" for the large, semi-arid Haouz plain, which is situated downstream and receives 250 mm of precipitation annually. More specifically, irrigation covers a sizable portion of the 2000 km3 Haouz plain.

Fig. 1
figure 1

The Haouz Plain and the Tensift watershed are study areas

Numerous lithological and structural elements and a varied and erratic hydrological behavior driven by geomorphological and climatic conditions define the research region. They give rise to various threats, such as the floods in Ourika on August 17, 1995, and October 28, 1999, which destroyed 142 structures that regularly caused severe damage, caused 200 fatalities, and flooded over 300 ha of arable land. More recently, on July 18, 2023, they also caused significant damage in Moulay Brahim. The landslides in this area, especially along the Tizi N'tichka national route, which links Ouarzazate and Marrakech, and the village of Ijoukak, where a dramatic landslide occurred in July 2019 that killed over 20 people due to the landslide and gully erosion, were also documented by the Hydraulic Agency of the Tensift Basin (ABHT).

Multi-hazard inventory

Compiling an inventory is a crucial step in assessing the hazard. Based on GPS-based field surveys, regional and national statistics from many sources, including the Tensift Hydraulic Basin Agency (ABHT) and analysis of Google Earth imagery of vulnerable areas in mountainous regions, the comprehensive A list of gully erosion, floods, and landslides in the Tensift watershed is available given by Fig. 2. The inventory maps were created using this information, and 620 gully erosion sites, 1291 landslide sites, and 762 inundation sites as of Fig. 1. The training and validation sites for each hazard class in the inventory map were selected by random selection. In the literature, the percentage ratio of 70/30% is frequently utilized for training and validation datasets (Bammou et al. 2024a; Hong et al. 2016; Pourghasemi and Rahmati 2018) and it is being continued in this study as well. This study applies this percentage ratio by integrating binary codings (0, 1), i.e., values associated with landslide-relevant and non-landslide-relevant pixels. The landslide inventory map was then converted into raster data at a resolution of thirty meters (30 m).

Fig. 2
figure 2

Photographs of the three hazards in the Tensift catchment collected in the field. 1a, 1b and 1c Flooding; 2a, 2b and 2c gully erosion; 3a and 3b landslides. NB: their locations are shown in Fig. 1

Data collection

The creation of the landslide susceptibility grids was based on different data sources. Table 1 shows a description of the other media used.

Table 1 Data used and sources

Multi-hazard conditioning factors (MCFs)

Multi-hazard conditioning factors (MCFs) was also conducted to determine the most pertinent criteria for each risk category. Table 2 and Fig. 3 provide a quick overview of the rasters and data layers of topography, climate, hydrology, vegetation, land use, and geology that were constructed using GIS software.

Table 2 Conditioning factors tested and used to map vulnerability to natural hazards
Fig. 3
figure 3figure 3

a Slope in °, b aspect, c elevation, d rainfall, e TWI, f NDVI, g drainage density, h lithology, i LULC, j LS factor, k distance to rivers, l SPI, m TRI, n geomorphons, o factor K, p HSG, q distance to roads, r soil type, s flow accumulation, t profile curvature, u distance to faults, v curvature, w valley depth, x TPI and y plan curvature

These include 25 conditioning factors, namely slope given in Fig. 3a, aspect given by Fig. 4b, elevation given by Fig. 3c, and precipitation given in Fig. 3d, which were generated from data from 14 precipitation stations provided by ABHT for the period 1992 to 2020. The Topographic Moisture Index (TWI) illustrated in Fig. 3e was calculated using Eq. (1). Normalised Differential Vegetation Index (NDVI) illustrated by Fig. 3f was determined using Eq. (2).

Fig. 4
figure 4

Flowchart of the methodology developed for this study

Other parameters such as drainage density (Fig. 3g), lithology (Fig. 3h) which includes 32 facies was derived from a 1:500,000 scale geological map of Marrakech, LULC (Fig. 3i), which was generated from Sentinel-2 images with 10 m resolution from 2022, LS factor (Eq. (3) and Fig. 3j) is one of the critical elements in the RUSLE equation, along with the distances to faults (Fig. 3u) and rivers (Fig. 3k), developed using spatial analyst Euclidean distance tool, SPI illustrated by Fig. 3l was calculated using Eq. (6), TRI illustrated by Fig. 3m was calculated using Eq. (4), geomorphons (Fig. 3n), factor K illustrated by Fig. 3o was calculated using Eq. (5), HSG (Fig. 3p), distance to roads (Fig. 3q), soil type (Fig. 3r), river accumulation (Fig. 3s), profile curvature (Fig. 3t), curvature (Fig. 3v), valley depth (Fig. 3w), TPI (Fig. 3x) and plan curvature (Fig. 3y). At a resolution of thirty (30 m) meters, the topographic variables were extracted using the digital elevation model (DEM).

Each layer shown in Fig. 3 was standardized to a pixel resolution of 30 m*30 m using software GIS in the WGS 84/UTM zone 29N projection system. Many factor sets were employed to generate the maps of hazard sensitivity, as indicated in Table 1: following the removal of two components, there are 13 factors for floods, 17 for gully erosion, and 14 for landslides.

$${\text{TWI }} = {\text{ In }}\left( {{\text{As}}/{\text{tan}}\beta } \right)$$
(1)
  • As: drainage area of the upstream,

  • Β: slope degree (Nellemann and Reynolds 1997).

    $$\text{TRI}=\text{Y}{\left[{\sum ({\text{X}}_{\text{ij}}-{\text{X}}_{00})}^{2}\right]}^\frac{1}{2}$$
    (2)
  • xii: each pixel's height next to (0, 0). roughened areas with steep slopes have positive TRI values, whereas regions with zero gradient have zero TRI values (Jones and Vaughan 2010).

    $$\text{NDVI}=\frac{\text{B}8-\text{B}4}{\text{B}8+\text{B}4}$$
    (3)
  • B8 = NIR and B4 = RED.

    $${\text{LS }} = \, \left( {{\text{m }} + { 1}} \right) \, \times \, \left[ {{\text{As}}/{22}} \right] \, \times \, \left[ {{\text{sin}}\beta /0.0{896}} \right]$$
    (4)
  • As: upstream drainage area.

  • β: slope degree.

    $${\text{K }} = {\text{ f}}_{{{\text{csand}}}} \, *{\text{f}}_{{\text{cl-si}}} \, *{\text{ f}}_{{{\text{orgc}}}} \, *{\text{ f}}_{{{\text{hisand}}}}$$
    (5)
  • ƒcsand: portion of soils that have a lot of gritty sand.

  • ƒcl-si: portion of soils with a high percentage of clay to silt.

  • ƒorgc: portion of soils that contain a lot of organic carbon.

  • ƒhisand: portion of very high sand-content soils.

    $${\text{SPI }} = {\text{ As }} \times {\text{ tan}}\beta$$
    (6)
  • As: upstream drainage area.

  • β: slope degree.

Selection of multi-hazard factors

The present study used six predictive models to improve ML prediction of susceptibility to floods, landslides, and water erosion. These models were subjected to several statistical tests to identify solid and linear correlations between the various components. These tests, which included variance inflation factor (VIF) calculated using Eq. (7) and correlation matrix analysis (CM), were used to identify and exclude the non-significant components, tolerance (TOL) calculated using Eq. (8), and mutual information (MI) calculated using Eq. (9). Low MI values indicated a minor effect and led to eliminating the causes producing flooding, landslides, and gully erosion. The MI analysis demonstrated the relevance of these components.

$${\text{VIF}}_{\text{j}}=\left[\frac{1}{{\text{Tol}}_{\text{j}}}\right]$$
(7)
$$\text{Tol}=1-{\text{R}}_{\text{j}}^{2}$$
(8)
$${\text{MI}}\left( {{\text{n}},{\text{j}}} \right) = {\text{ H}}\left( {\text{n}} \right) - {\text{H}}\left( {{{\text{n}} \mathord{\left/ {\vphantom {{\text{n}} {\text{j}}}} \right. \kern-0pt} {\text{j}}}} \right)$$
(9)
  • j: affects landslide susceptibility (LS), flood susceptibility (FS), and gully erosion susceptibility (GES).

  • n: subclass of LS, FS, and GES impact factors.

  • Toli: j tolerance.

  • VIFj: j inflation factor.

  • MI (n; j): n and j data exchange.

  • R: the propensity of j's regression coefficient on all other predisposition components.

  • H(n): conditional entropy for n given the landslide, flooded, and eroded zone j. is the entropy of n − H(n/j).

Using Eq. (10), the normalized frequency ratio (NFR) was calculated, the basis for the model's application and the optimal analysis of the factors that affected LS, FS, and GES. This approach is the most widely recommended method for standardizing the significance regarding the variety of data used as input for the different factors (Mao et al. 2021; Youssef et al. 2023). Therefore, to define the link between the factors affecting LS, FS, and GES and the susceptible locations, the frequency ratio (FR) derived from Eq. (11) was allocated to the subclass of factors impacting LS, FS, and GES, as similar to the approach followed by Masoud et al. (2022). Afterwards, Eq. (10) was used to standardize the data. This led to the transformation of each map into an NFR of 1 for high LS, FS, and GES and 0 for low LS, FS, and GES.

$${\text{NFR}}_{{\text{n}}} = \frac{{{\text{FR}}_{{\text{n }}} - {\text{ Max}}\left( {{\text{FR}}_{{\text{n}}} } \right)}}{{{\text{Max}}\left( {{\text{FR}}_{{\text{n}}} } \right){ } - {\text{ Min}}\left( {{\text{FR}}_{{\text{n}}} } \right)}}{*}\left( {0.99 - { }0.01} \right) + 0.01$$
(10)
$${\text{FRn}} = \frac{{\frac{{{\text{Wn}}}}{{{\text{Wt}}}}}}{{\frac{{{\text{Pn}}}}{{{\text{Pt}}}}}}$$
(11)
  • n: is the category of variables affecting the likelihood of landslides, floods, and gully erosion.

  • FRn: n frequency ratio

  • NFRn: n normalized frequency ratio

  • Wn: n is the number of risk sample points.

  • Wt: points in the overall risk sample.

  • Pn: n number of pixels

  • Pt: total amount of all pixels

The determining factors Jenks' natural break approach was utilized to analyze the maps and classify LS, FS, and GES into subclasses (Sarker 2021).

Methodology flowchart

The current study was carried out in five different phases. Phase 1 involved the identification of the three natural hazards dealt with in this study to collect exhaustive data on various events from other sources; Phase 2 involved the differentiation of the different events based on an in-depth analysis of the published literature; Phase 3 dealt with the modelling of the various types of hazard using six ML models; phase 4 involved the validation of the models and the selection of the most appropriate model for each hazard. Finally, in phase 5, a multi-hazard susceptibility map (MHSM) was created by integrating the model with the best AUC for each type of hazard. Figure 4 illustrates the overall methodology of this research.

The methodology employed in this research is robust, multifaceted, and based on several key elements. Firstly, it begins with carefully curating high-quality data to ensure its reliability and relevance to the study objectives. Secondly, rigorous validation techniques are applied to assess the accuracy and integrity of the data collected, thereby enhancing the credibility of subsequent analyses. In addition, the methodology recognizes the importance of carefully selecting and prioritizing conditioning factors and using diverse inventories from reputable sources by integrating multiple data sets of varying spatial and temporal dimensions. Finally, the systematic comparison of different machine learning models is an integral part of the methodology, ensuring a rigorous evaluation is employed to select the best-performing model.

Methods of validation

For the six models developed using different performance measures, such as specificity (Eq. 12), precision (Eq. 15), sensitivity (Eq. 13), F1 score (Eq. 16), and accuracy (Eq. 14), the outcomes of the suggested approach were validated. The performance indices are deemed significant if there is a geographic correlation between the areas that represent the risks of flooding, landslides, and gully erosion and the measured stable regions and the predicted risk areas of indicated risks (Costache 2019; Costache and Tien Bui 2020).

$$\text{Specifity}=\frac{\text{TN}}{\text{FP}+\text{TN}}$$
(12)
$$\text{Sensitivity}=\frac{\text{TP}}{\text{FN}+\text{TP}}$$
(13)
$$\text{Accuracy}=\frac{\text{TN}+\text{TP}}{\text{FP}+\text{TP}+\text{FN}+\text{TN}}$$
(14)
$$\text{Precision}=\frac{\text{TP}}{\text{FP}+\text{TP}}$$
(15)
$$\text{F}1\text{ score}= \frac{2}{\frac{1}{\text{Pr}}+\frac{1}{\text{Recal}}}\text{ Recall}=\frac{\text{TP}}{\text{TP}+\text{FN}}$$
(16)

While TP (true positives), TN (true negatives), FP (false positives), and FN (false negatives).

The investigation also used the ROC (receiver operating characteristic) curve, another widely utilized metric. The most popular ROC curve calculates the prediction models' accuracy (AUC) by analyzing the area under the curve (Eq. 17). Additionally, mean absolute error (MAE) (Eq. 19) and root mean square error (RMSE) (Eq. 18) were used to map the vulnerability to landslides, floods, and gully erosion. Numerous scholarly research have made use of both of these types of indexes.

$$\text{AUC}=\frac{(\sum \text{TP}+\sum \text{TN})}{(\text{P}+\text{N})}$$
(17)

where TP stands for the real positive, TN for the actual negative, and P and N, respectively, are the total numbers of pixels with and without torrential events.

$$\text{RMSE}= \sqrt{\frac{1}{\text{n}}\sum\limits_{\text{i}=1}^{\text{n}}{({\text{X}}_{\text{predicted}}-{\text{X}}_{\text{actual}})}^{2}}$$
(18)
$$\text{MAE}= \sqrt{\frac{1}{\text{n}}\sum\limits_{\text{i}=1}^{\text{n}}\left|{\text{X}}_{\text{predicted}}-{\text{X}}_{\text{actual}}\right|}$$
(19)

n: is the total number of samples in the learning or testing phase, Xpredicted: the projected value from the susceptibility models (landslides, floods, and gully erosion), Factual: the observed value.

Results

Factor selection and multicollinearity

The Pearson correlation analysis between fifteen influencing factors (LULC, soil type, topographic wetness index, precipitation, elevation, slope, river accumulation, TPI, SPI, elevation, plan curvature, HSG, and aspect) and flood risk is shown in Fig. 5A. In contrast, Fig. 5B illustrates the Pearson analysis between eighteen influencing variables (drainage density, aspect, LS-factor, lithology, drainage density, K-factor, TPI, TWI, precipitation, distance to rivers, curvature, elevation, slope, NDVI, LULC, and TRI) for gully erosion risk. The following factors were considered for the landslide risk shown in Fig. 5C: soil type, proximity to rivers, topographic position index (TPI), topographic wetness index (TWI), and linkage to faults.

Fig. 5
figure 5

The conditioning factors for A flood, B gully erosion, and C landslide were studied using the variance inflation factor (VIF) and tolerance (TOL) multicollinearity

To ensure the complexity of the input variables investigated in this study, the findings of the tolerance and inflation coefficients of variance (VIF) indicate a Tol value between 0.12 and 0.97 for HSG and TWI, respectively, and between 1.02 and 8.02 for TWI and 8.02 for HSG at the lowest VIF value as illustrated by Fig. 5A. Among the fifteen indicators taken into consideration in this study, The Tol and VIF criteria led to the exclusion of TWI and SPI from additional inquiry. Subsequently, the MI of the remaining thirteen elements (distance to the river) to 0.013 (HSG) is computed and produces positive results as described by Fig. 6A. Thus, the most significant aspect is the distance to the river, which is followed by slope (MI = 0.159), height (MI = 0.245), and backwater (MI = 0.183).

Fig. 6
figure 6

Analysis of conditioning components' multicollinearity using the correlation matrix for A flood, B gully erosion, and C landslide

Figure 6 showed that the LS factor and slope for the gully erosion risk had the most substantial positive correlation (0.69). Elevation and slope, drainage density and gully depth, TWI and gramophone, SPI and gramophone, SPI and LS factor, precipitation, elevation, slope, and TRI all showed significant linear connections. Figure 5B illustrates the results of the tolerance and variance information factors (VIF) used in this study to test the overlap of the forage influencing factors. For Geomorphone and TWI, respectively, the Tol value falls between 0.15 and 0.94, and the maximum VIF value for Geomorphones is 6.35, while the minimum value for TWI is 1.05. The gramophone component was removed from the 18 variables utilized in this analysis because of Tol and VIF constraints. The MI of the parameters displayed in Fig. 6B is then calculated, and the findings show positive values in the range of 0.261 (slope) to 0.029 (aspect). As a result, slope is the most significant factor, followed by lithology (MI = 0.213), TPI (MI = 0.226), and LS (MI = 0.227).

A significant linear relationship was found between the following variables: NDVI and precipitation, elevation and slope, lithology and elevation, slope and lithology, and slope and lithology and the distance between roads and rivers. The results for landslide risk indicate that the highest positive correlation value (0.63) was a correlation between road mileage and height. The findings of the tolerance and variance information factors (VIF), which were utilized in this study to examine the multicollinearity of the forage-affecting variables, show that the LULC values and elevation range from 0.30 to 0.92 in terms of Tol and that the values of LULC and elevation range from 1.08 to 3.30 on the maximum VIF value (Fig. 5C). The MI of the remaining 14 components (Fig. 7B) indicates positive values between 0.132 (inclination) to 0.021 (LULC). Thus, the slope is the most significant component, followed by the following: height (MI = 0.120), lithology (MI = 0.118), road distance (MI = 0.095), and fault distance (MI = 0.091).

Fig. 7
figure 7

The importance of selected hazard predictive factors for A flood, B gully erosion, and C landslide

Six machine learning models were applied to generate sensitivity maps for floods (Fig. 8A), gully erosion (Fig. 8B), and landslides (Fig. 8C) based on risk predictions using independent variables and the actual risk condition using dependent variables. Based on the Jenks classifier for natural breaks (Jenks and Caspall 1971). Five classifications were applied to each sensitivity map: very low, low, moderate, high, and very high. Maps showing the flood sensitivity of the Tensift watershed and the Haouz plain were created using the SVM, RF, KNN, DT, ANN, and XGBoost models (Fig. 8A(a–g)). Furthermore, gully erosion sensitivity maps were created for the same watershed using the same models (Fig. 8B(a–g)). Finally, landslide sensitivity maps were produced using the same models (Fig. 8C(a–g)).

Fig. 8
figure 8figure 8figure 8

Susceptibility maps for A flooding, B gully erosion, and C landslide, predicted by the (a) SVM, (b) RF, (e) DT, (d) KNN, (g) XGBoost, and (f) ANN models

Validation of models

The AUC curves for the six machine learning models that were utilized to develop the models for landslides, floods, and gully erosion are shown in Fig. 9. The results show that the AUCs for the different flood vulnerability models range from 90.69 to 96.21% in the training phase and from 87.56 to 93.78% in the testing phase, corresponding to high to extremely high performance. The XGBoost, RF, KNN, DT, ANN, and SVM models achieved AUCs of (96.21%), (94.32%), (94.32%), (93.38%), (91.01%) and (90. 01%) in the training phase and AUCs of (93.78%), (93.71%), (90.53%), (90.67%), (87.56%) and (92.02%) in the test phase (Fig. 9A), for gully erosion the results show an AUC that varies from 96.71 to 72.84% in the training phase and from 91.07 to 78.60% in the test phase, for landslides, the results show an AUC that varies from 94.57 to 70.03% in the training phase and from 93.41 to 70.03% in the test phase.

Fig. 9
figure 9

ROC curve analysis of different models of A flooding, B gully erosion, and C landslide for training and validation data

The effectiveness of the chosen XGBoost FS, GES, and LS prediction models throughout the training and validation stages is displayed in Tables 4 and 5. Along with the effectiveness of the training data used (30%) and test data used (70%), the following metrics are evaluated by adhering to the approaches of (Bammou et al. 2024d): Pr (precision), Se (sensitivity), Sp (specificity), Ac (accuracy), F1 score, FPR (false positive rate), MAE (mean absolute error), RMSE (root mean square error), and AUC-ROC (area under the receiver operating characteristic curve).

The XGBoost model performed exceptionally well for all training data, as evidenced by the following values: (Pr = 0.95), (Se = 0.97), (Sp = 0.95), (Ac = 0.97), (Recall = 0.96), (F1 score = 0.95), (MAE = 0.04), and RMSE (values = 0. 19) for the risk of flooding; (Pr = 0.99), (Se = 0.94), (Sp = 0.99), (Ac = 0.96), (Recall = 0.95), (F1 score = 0.97), (FPR = 0.009), (MAE = 0.03) and RMSE (values = 0. 12) for the risk of gully erosion; and lastly, (Pr = 0.97), (Se = 0.98), (Sp = 0.96), (Ac = 0.97), (Recall = 0.98), (F1 score = 0.98), (FPR = 0.03), (MAE = 0.05) and RMSE (values = 0.23) for the risk of landslides, as shown by Tables 3 and 4.

Table 3 Performance of best XGBoost model based on training data
Table 4 Performance of best XGBoost model based on testing data

For all test data, the XGBoost model performed excellently, as confirmed by the values (Pr = 0.93), (Se = 0.94), (Sp = 0.93), (Ac = 0.94), (Recall = 0.92), (F1 score = 0.92), (FPR = 0.07), (MAE = 0.06) and RMSE (values = 0.25) for the flood risk, the values (Pr = 0.94), (Se = 0.90), (Sp = 0.93), (Ac = 0.91), (Recall = 0.90), (F1 score = 0.92), (FPR = 0.07), (MAE = 0.03) and RMSE (values = 0.118) for gully erosion risk and lastly the values (Pr = 0.93), (Se = 0.95), (Sp = 0.92), (Ac = 0.93), (Recall = 0.95), (F1 score = 0.94), (FPR = 0.08), (MAE = 0.06) and RMSE (values = 0.25) for the risk of landslides as described by Table 5.

Multi-hazard maps

The multi-hazard map was produced by constructing and evaluating three separate hazard susceptibility maps, each correlated with a distinct hazard (FL, GES, and LS). Using the XGBoost model, this development was based on the link between independent factors, i.e., hazard indicators, and dependent variables, i.e., locations at risk of flooding, gully erosion, and landslides. The three vulnerability maps (flooding, gully erosion, and landslides) were then integrated into Software GIS to create the comprehensive multi-hazard risk map shown in Fig. 10.

Fig. 10
figure 10

A multi-hazard risk map was developed using the most potent XGBoost model

As illustrated by Fig. 11, the results indicate that the research region is separated into seven vulnerability groups. The region is exposed to various risks to the extent of around 71.03%, with the percentage breakdown as follows: 36.05% of the total area is made up of gully erosion (GES), 16.92% of floods (FS), 10.59% of landslides and gully erosion (GES-LS), 6.54% of floods and gully erosion (FS-GES), 0.56% of landslides (LS), 0.33% of floods and landslides (FS-LS), and 0.31% of floods, gully erosion, and landslides (FL-GES-LS).

Fig. 11
figure 11

Percentages of various hazard types in the research region

Discussion

A range of modelling techniques can significantly enhance the understanding of risk management. In a semi-arid area prone to many hazards such as flooding, gully erosion, landslides, and, more recently, earthquakes, such as the catastrophic earthquake of September 8, 2023, in the research region, there has been an acceleration in the establishment of landslide hazard zones.

This study investigated three natural hazards in the Tensift watershed, a region in the Moroccan High Atlas with mountain and lowland characteristics. The aim was to perform multi-hazard modelling to understand the relationships and to assess the susceptibility to landslides, gully erosions and flood risks in the study area. To this end, six ML models were used to study flooding, gully erosion, and landslides: DT, RF, SVM, KNN, ANN, and XGBoost. The analysis of (CM), (TOL), (VIF), and (MI) led to the conclusion that the selected factors can influence flooding, gully erosion, and landslides in the Tensift catchment. To evaluate the models' success, training datasets, validation, and evaluation metrics were employed to assess the ML models' performance. Accuracy, precision, recall, F1 score, mean absolute error (MAE), root mean square error (RMSE), and area under the receiver characteristic curve (AUC-ROC) are examples of standard measurements. The XGBoost model for the three hazards showed an average accuracy of (AUC = 95.83%) and (AUC = 92.75%) in the training and test phases, respectively.

The spatial prediction of FS is dependent on several affecting variables. Due to their collinearity with other factors, which reduced the prediction's efficacy, TWI and SPI were eliminated from this study's original set of fifteen components. Additionally, MI claims that the most crucial element is the distance to the river. According to the spatial prediction results of (Al-Areeq et al. 2022; Meliho et al. 2022) regarding FS, the most critical areas at a given distance from the river were considered very vulnerable to high-density floods. Since the Haouz Plain is located at low elevations, the findings of the present study are being validated by the previous scientific literature, notably (Meliho et al. 2022), which shows that high susceptibility is connected with elevation, slope, TPI, and TRI in Tensift rivers and sub-catchments, especially the Ourika, Zat, and Rheraya sub-catchments.

The key variables influencing gully erosion in the research region for the spatial prediction of GES indicate that geological factors, represented by lithology and topographic factors, such as slope, LS factor, and TPI, are more significant than other variables. This fact is primarily in line with the findings of (Baiddah et al. 2023) for lithology and the LS variables in the Chichaoua sub-basin; his work also highlights the importance of additional elements like altitude and proximity to rivers. Selecting a mixed zone, which compares altitude values by combining a lowland, a mountainous zone, and the study zone's broad area, can help to understand this divergence.

The combination of exceptionally highly friable lithologies, such as Tertiary clays and sediments and Neogene phosphate marls, with high altitudes and little vegetation, especially on steep slopes and in areas where the vegetation cover is damaged, favors the occurrence and development of this type of erosion (Arabameri et al. 2020; Bammou et al. 2024c). Lastly, spatial prediction of LS findings demonstrates that the following five factors—slope gradient, height, lithology, proximity to highways, and proximity to rivers—are critical in determining the likelihood of landslides. Unstable soils migrate along slopes due to the constant action of gravity and the slope gradient and elevation variables. Steeper hills are more likely to cause landslides. In addition to affecting slope stability, excavation activities related to the building or extension of road networks can raise the danger of landslides in locations near highways. Another critical factor in landslides is the proximity to rivers. The stability of slopes next to rivers is threatened by bank and gully erosion, raising the possibility of landslides in these locations.

Compared to the other six models, the XGBoost model is more accurate primarily because it uses all base learners' prediction outcomes, improving its recognition rate and generalization capacity. When determining and preserving the optimal path of action, several techniques will be employed to address missing values that may have occurred on other nodes. XGBoost adds a regular term to the objective function and allows custom loss functions, but it also minimizes the learning model and avoids overfitting, which accelerates learning. Because of this, the XGBoost model ultimately produces improved simulation results, and the XGBoost-based method is practical and efficient for mapping the susceptibility of landslides, floods, and gully erosion.

The use of the XGBoost algorithm, which has shown excellent predictive capabilities in this study, has many advantages. One notable advantage is its simple implementation, which does not require prior data preprocessing and its built-in mechanisms for handling missing data. In addition, the ensembled algorithm employs bias reduction techniques by sequentially combining multiple weak learners to improve the quality of predictions iteratively (Bashir et al. 2024a, b; Benzougagh et al. 2024; Zounemat-Kermani et al. 2021). This approach effectively mitigates the significant biases that often occur in ML models. In addition, XGBoost prioritizes features in the training phase that contribute to improved prediction accuracy to increase computational efficiency. This feature can be beneficial in reducing data attributes and efficiently managing large data sets.

It is essential to know that the presence of one danger can cause the occurrence of another. For example, knowledge of multiple hazards, their interrelationships, linkages, and cascading effects can increase awareness of their processes and the best methods to prevent and reduce catastrophic losses while supporting good land management (Godschall et al. 2020; Msabi and Makonyo 2021; Ouallali et al. 2024). The resulting maps provide essential information to prepare and monitor existing and future anthropogenic activities.

Conclusion

The research has substantially contributed to the knowledge of how vulnerable Morocco's Northern and High Atlas areas are to different threats. Due to their distinct geography and dense population, these mountainous areas face an intricate web of natural threats. The primary goal was to give decision-makers the necessary information to identify and demarcate high-risk zones. This research yields a thorough multi-hazard vulnerability map incorporating these three main natural risks.

This map divides the research area into six danger zones, accounting for 71.03% of the area. Three zones are linked to specific hazards: the landslide risk zone (0.56%), which is situated in steep, high-altitude Tensift sub-basins; the erosion risk zone (36.05%), which is concentrated along major rivers like the R'dat, Zat, and Ourika; and the flood risk zone (16.92%), which primarily affects the Haouz plains and certain tributaries of the High Atlas.

Additionally, the study showed that there are three zones—the flood and landslide hazard zone (0.33%), the flood and gully erosion hazard zone (6.54%), and the landslide and gully erosion hazard zone (10.59%)—where two types of risks overlap. Surprisingly, the High Atlas has multi-hazard zones, especially in the Ourika and N'fis catchments, which comprise 0.31% of the study area and are impacted by all three hazards: landslides, gully erosion, and floods.

The XGBoost machine learning model demonstrated exceptional dependability and yielded precise forecasts for several categories of hazards. Our multi-hazard map was the result, and it achieved an impressive average of (AUC = 95.83%) during the training phase and (AUC = 92.75%) during the testing phase. The multi-hazard analysis is essential for sustainable growth in these hilly and flood-prone areas as infrastructure and urban and rural development continue to rise. Based on the multi-hazard risk map, policy recommendations for sustainable land management and hazard mitigation could be proposed in the context of the High Atlas landscape. Developing strategies for increasing public awareness and education about these hazards and the importance of sustainable land management could also contribute to a more comprehensive understanding of multi-hazard assessments and more effective land management and hazard mitigation.

Albeit it is a pioneering study in the field of multi-hazard assessment, this study has a few identified limitations that could be rectified in future research studies. Firstly, the study's geographical focus might limit the results' applicability to other regions with different topographical, geological, and meteorological conditions. Conducting similar studies in the future in various geographical areas could validate the applicability of the models and allow for comparison of the results. The ML models used in this study rely on causal factors derived from reputable and trusted sources. Future studies could incorporate real-time or dynamic data into ML models to study the reflection of changing conditions and enhance prediction accuracy. Lastly, this study does not extensively acknowledge how the changes in anthropogenic activities could impact the three aforementioned hazards. It is highly encouraged to encapsulate in-depth studies on how the changes in anthropogenic activities such as urbanization, deforestation, and land use dynamics could impact landslides, soil erosion and flood risks could get incorporated into ML models.