1 Introduction

Geological hazards refer to geodynamic activity or phenomena that are formed under the interactions of natural or human factors, which cause losses to human life and property and damage to the environment, such as landslides, rockfalls, and so on (Huang et al. 2020; Lin et al. 2021). More than 90,000 landslides occurred in southern and northwestern China in 1949–2011, posing risks to tens of millions of people and causing billions of economic losses to properties (Liu et al. 2013). Besides, warming temperatures, and particularly rapid winter warming episodes, accelerate the melting process of the permafrost (Zhang et al. 2019), which may also intensify geological hazards such as landslides. Therefore, geological hazard susceptibility assessment has become a challenging task and also an urgent demand.

Geological hazard susceptibility is recognized as the likelihood of hazard occurrence in a given location (Fell et al. 2008; Saha et al. 2020; Samia et al. 2020) and is closely related to the characteristics of the geographical elements (Dahal et al. 2008a, 2008b; Pourghasemi et al. 2012). Because of the complicated interactions between the various driving factors of susceptibility, it is also a challenging task to quantify the susceptibility of geological hazards (Fell et al. 2008). Although building an indicator system and assigning weights to calculate an evaluation index has become the most widely used method for risk and susceptibility assessment, it is often criticized for its subjectivity (Hadmoko et al. 2010; Hadji et al. 2013; Niu et al. 2015; Huang et al. 2020). The development of machine learning technologies provides alternative solutions for this issue. By analyzing the inherent geographic characteristics of potential hazard-affected sites (for example, topography, elevation, lithology, precipitation, and so on), machine learning models can detect hazards within a certain range and predict their distribution. Among all the machine learning algorithms, the random forest (RF), support vector machine (SVM), and backpropagation (BP) neural network models have been widely used in the field of disaster modeling (Lee et al. 2007; Chen et al. 2009; Yilmaz 2009; Bednarik et al. 2012; Catani et al. 2013; Huang et al. 2020) because of their high prediction performance (Cutler et al. 2007). Thus, by comparing the results of the three algorithms with the receiver operating characteristic curve (ROC), we can select the best regional model for a susceptibility assessment.

The Hengduan Mountains Region (HMR) covers an area of more than 600,000 km2 characterized overall by a relatively dry climate, which is varied spatially and seasonally by zones of temperate monsoon, subtropical monsoon, and alpine plateau climates. The complex topographic features and climate conditions in the HMR render this region extremely susceptible to large-scale geological hazards (Huang 2009; Li et al. 2011; Liu et al. 2013; Huang et al. 2014; Xu et al. 2019). Although single hazard analysis in isolation is increasingly difficult to simultaneously meet the demand of a full evaluation of risks and the risk reduction of natural hazards (Peduzzi et al. 2009; Koks et al. 2019; Pourghasemi et al. 2020; Pouyan et al. 2021), few studies are available that address this issue. This is especially true in the HMR, which is not conducive to hazard by hazard land use planning and disaster risk management in this complex, multi-hazard region. Therefore, evaluation of the susceptibility of the HMR to geological hazards is challenging but also constitutes the most significant potential contribution of this current study.

The main objective of this study is to evaluate the susceptibility of the HMR in China (Fig. 1) to geological hazards, which only include two types of hazards of landslides and rockfalls in this study. They have been selected as representative hazards because they have high occurrence frequencies, large disaster-affected areas, complex driving factors, and massive damages (Huang 2009). Six models were developed from three machine learning algorithms. Then we compared the modeling performance of these algorithms and screened out the most effective one to identify the regions with high susceptibility to geological hazards and map the susceptibility of the HMR to geological hazards. This study provides a theoretical framework for the evaluation of the susceptibility of mountainous regions to multiple geological hazards.

Fig. 1
figure 1

Location of the Hengduan Mountains Region in southwestern China

2 Data

The data we analyzed in this study include geological hazard data with longitude and latitude of the hazard sites (“hazard points” in the subsequent sections) and driving factors of geological hazards. The hazard points were used for training the machine learning models, while the driving factors were used to implement susceptibility assessment.

2.1 Geological Hazard Inventory

Through on-the-spot visits, we obtained the hazard inventory from the Natural Resources Bureaus of 11 cities in Sichuan Province, Tibet Autonomous Region, and Yunnan Province. This inventory records the location and scale of geological hazards, covering 9980 hazard points, wherein 7159 hazard points are for landslides and 2821 for rockfalls. Although these hazard points are not all located in the HMR but are distributed in the HMR and its immediately surrounding areas, they are all related to similar driving factors. Therefore, these records were incorporated to form our hazard inventory (Fig. 2).

Fig. 2
figure 2

Hazard inventory of landslides and rockfalls in the Hengduan Mountains Region and adjacent areas in southwestern China. DEM = digital elevation model

To get better results by training the machine learning models, 2080 non-rockfall points and 5500 non-landslide points within the HMR were randomly selected while avoiding a 500-meter buffer zone along the rivers and roads, and around existing hazard points in the inventory. By adding them to the hazard inventory, the number of non-hazard points and hazard points is basically the same.

2.2 Geological Hazard Driving Factors

Examples of existing research that address the driving factors behind geological hazard assessment, which we consulted, include Lee and Min (2001), Süzen and Doyuran (2004), Floris et al. (2011), Erener and Düzgün (2013), Mandal and Maiti (2015), Shahabi and Hashim (2015), Strehmel et al. (2015), and Hou et al. (2016). In this study, we adopted 10 factors for geological hazard susceptibility evaluation, including elevation, slope, aspect, curvature, lithology, distance from geological fault, precipitation, Normalized Difference Vegetation Index (NDVI), distance from the river, and distance from road (Table 1), of which the first eight were used for model training and assessment of susceptibility to geological hazards. The latter two were mainly used to identify nongeological hazard points.

Table 1 Driving factors selection and data source

Our digital elevation model (DEM) is sourced from the ASTER GDEM data set jointly developed by the National Aeronautics and Space Administration (NASA) of the United States, the Japanese Ministry of Economy, Trade and Industry (METI), and the Japan Space Agency, with a spatial resolution of 30 m.Footnote 1 The slope, aspect, and curvature data were all processed from the DEM using the spatial analysis toolbox of ArcGIS. Lithology and the distance from the geological faults demonstrate geological structure. Hydrometeorological indicators include precipitation, NDVI, and distance from rivers, and the human activity indicator refers to the distance from roads. The precipitation data are based on a set of reanalysis data sets released by the National Qinghai-Tibet Plateau Science Data Center,Footnote 2 the data set covers the period of 1979−2015 with a spatial resolution of 0.1°. We calculated the average precipitation using the R software package and interpolated the data to 500 m. The distribution of rivers and roads is sourced from OpenStreetMap.Footnote 3 We removed artificial irrigation canals and country roads but kept the main roads for the analysis.

3 Methods

The scale of the geological hazards, such as the size and/or volume of the rockfalls and landslides, was taken as the basis for the susceptibility assessment and the susceptibility to the geological hazards was subdivided into 4 levels (Table 2) as suggested by the Specification of Comprehensive Survey for Landslide, Rockfall and Debris Flow (1:50,000) issued by China’s Ministry of Land and Resources (China 2014). About 70% of the hazard points were randomly screened out from each level to form the training data set, while the rest form the validation data set. Six models were developed from three types of machine learning methods, that is, random forest (RF) model, support vector machine (SVM) model with different nuclear functions, and backpropagation (BP) neural network model.

Table 2 Criteria for the classification of susceptibility to geological hazards

3.1 Random Forest (RF) Model

The random forest model is an ensemble machine learning model and was developed from decision trees. Modeling based on the RF model involves four steps (Huang et al. 2020):

  1. (1)

    Take k samples from the training sample D to build a random tree.

  2. (2)

    Randomly extract m features from each sample, and select the best feature as the growth node.

  3. (3)

    Let each tree grow as much as possible without pruning.

  4. (4)

    Repeat the above steps continuously to form a random forest.

In addition, in the modeling by the RF model, the mean decrease Gini and the mean decrease accuracy index can be used to quantify the importance of each influencing factor.

3.2 Support Vector Machine (SVM) Model

The SVM is a linear classifier, which maximizes the difference between categories of different features by projecting the feature points into the multi-dimensional feature space so as to find the optimal n-dimensional hyperplane to classify the data, and therefore transforms the classification problem into a solution to a convex secondary planning problem (Eqs. 12) (Yao et al. 2008; Xu et al. 2012; Pourghasemi et al. 2013). In essence, the SVM model is a two-classification model and can also realize the multiclassification of data.

$$y_{i} (w^{T} x_{i} + b) \ge 1$$
(1)
$$\max \frac{1}{\left\| w \right\|}$$
(2)

One of the important parameters in the SVM model is the kernel function, which mainly includes: linear, polynomial, radial (RBF), and sigmoid (Eqs. 36). We developed four kinds of SVM models according to different kernel functions for the evaluation of the susceptibility to geological hazards.

$${\text{Linear kernel}}\,\,\,K(X_{i} ,X_{j} ) = X_{i}^{T} X_{j}$$
(3)
$${\text{Polynomial kernel}}\,\,\,\,K(X_{i} ,X_{j} ) = (\gamma X_{i}^{T} X_{j} + r)^{d} ,\gamma > 0$$
(4)
$${\text{Radial kernel}}\,\,K(X_{i} ,X_{j} ) = ( - \gamma \left\| {X_{i} - X_{j} } \right\|),\gamma > 0$$
(5)
$${\text{Sigmoid kernel}}\,\,\,K(X_{i} ,X_{j} ) = \tanh (\gamma X_{i}^{T} X_{j} + r)$$
(6)

where γ, d, and r are the relevant parameters in each kernel function.

3.3 Back Propagation (BP) Neural Network Model

The BP neural network model is a multilayer feedforward network that incorporates an error backpropagation algorithm (Ying et al. 2017). The model includes a three-layer structure: namely an input layer, a hidden layer, and an output layer. Each structure contains multiple nodes, and the hidden layer can be set with multiple layers of nodes. Different nodes are connected by lines and assigned weights. Compared with ordinary neural network algorithms, the BP neural network model is characterized by the backpropagation of errors. The calculation process mainly includes forward calculation and error adjustment. The main process is as follows (Wang et al. 2005):

The output of hidden layer node:

$$y_{h}^{k} = f\left( {\sum\limits_{i = 1}^{{N_{1} }} {\omega_{ih} \cdot x_{i}^{k} } + \theta_{k} } \right)$$
(7)

The output of output layer node:

$$z_{j}^{k} = f\left( {\sum\limits_{{h = 1}}^{{N_{2} }} {\omega _{{hj}} \cdot y_{h}^{k} } + \gamma _{j} } \right)$$
(8)

Error function:

$$E = \frac{1}{2}\sum\limits_{k,j}^{{p,N_{3} }} {\left( {T_{j}^{k} - z_{j}^{k} } \right)}^{2}$$
(9)

The adjustment of weight:

$$\Delta \omega = - \eta \frac{\partial E}{{\partial \omega }}$$
(10)

Weight adjustment:

$$\omega = \omega + \Delta \omega$$
(11)

where \(\theta_{h}\) and \(\gamma_{j}\) are the thresholds of the hidden layer node and the output layer node, respectively; \(f\) is the transfer function; \(\eta\) is the learning step; and P is the number of samples, k = 1, …, P.

3.4 Model Validation

The predictive accuracy of the machine learning models is evaluated by the receiver operating characteristic (ROC) curves and area under the curve (AUC) values (Gorsevski et al. 2006; Trigila et al. 2015). The accuracy of model classification consists of four types of results: true positive class (TP), false positive class (FP), false negative class (FN), and true negative class (TN) (Table 3). The true positive class and the false negative class represent the correct classification.

$${\text{TPR}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$
(12)
$${\text{FPR}} = \frac{{{\text{FP}}}}{{{\text{FP}} + {\text{TN}}}}$$
(13)
Table 3 Confusion matrix of the classification model

4 Results

In this section, we first present the ROC curves of the six models to show their accuracy and choose the most suitable model for further application; then the susceptibility to landslides, rockfalls, and geological hazards as a whole is assessed, and the susceptibility is mapped.

4.1 Verification of the Model Accuracy

Figure 3 illustrates the ROC curves and AUC values of the rockfall and landslide hazards respectively. Overall, for the susceptibility to the rockfall hazard, the SVM model with sigmoid kernel function has a poor modeling performance, while the training results of other models are satisfactory with AUC above 0.8. Wherein, the RF model obtains the highest AUC of 0.88. As for the susceptibility to landslides, most models have AUC values between 0.7 and 0.8, of which the SVM model with sigmoid kernel function performs the worst, with an AUC value of 0.59, while the RF model performs the best with an AUC value of 0.83. In general, the RF model is the best choice for susceptibility assessment to the geological hazards.

Fig. 3
figure 3

Receiver operating characteristic (ROC) curves and area under the curve (AUC) values of the six models considered in this study. SVM = Support vector machine; BP = Back propagation; RBF = Radial Basis Function

4.2 Evaluation of Susceptibility to Rockfall and Landslide

Figures 4 and 5 demonstrate the spatial pattern of the susceptibility to rockfall and landslide respectively. To show the connections between altitude and slope and geological hazards, we overlay geological hazard points with our DEM (Digital Elevation Model). It is evident that the hazard points are mainly distributed along the valleys, especially the Jinsha River valley where we observe a concentration of multiple large-scale landslide points. River valleys are usually characterized by steep slopes vulnerable to erosion and erosion process by river flow also cause instability of riverbeds; all of these factors can trigger higher occurrences of geological hazards such as rockfalls and landslides. We can observe a more dispersed distribution of large-scale landslides in the southern and central areas of Sichuan Province. We also mapped the susceptibility to rockfall and landslide respectively (Figs. 4a, 5a). In general, medium and high susceptibility to landslide and rockfall is identified mainly across the central and southern HMR and is specifically distributed along the mountains.

Fig. 4
figure 4

Assessment of landslide susceptibility and distribution of hazard points in the Hengduan Mountains Region of China. a Landslide susceptibility map. b Superimposing hazard points on DEM. c Superimposing hazard points on landslide susceptibility map

Fig. 5
figure 5

Assessment of rockfall susceptibility and distribution of hazard points. a Rockfall susceptibility map. b Superimposing hazard points on DEM. c Superimposing hazard points on rockfall susceptibility map

With regard to susceptibility to landslides, areas with a very high susceptibility account for 0.18% of the study region (Table 4). Specifically, the areas with very high susceptibility to landslides are concentrated in the eastern parts of the Tibet Autonomous Region and are distributed along the Lancang River and the Jinsha River. The areas dominated by very high susceptibility to landslides are found near geological faults and these areas usually have a highly instable geological structure and are susceptible to geological hazards. Meanwhile, the areas across the Three Parallel Rivers (Nujiang River, Lancang River, and Jinsha River) region are dominated by distinct topographic relief, steep slopes, and low vegetation coverage, all of which are factors conducive to the occurrence of large-scale landslides (Huang 2009). The areas with high and medium susceptibility to landslides account for 3.27% and 14.28% respectively of the study region as a whole (Table 4). These areas are identified mainly in the central parts of the study region.

Table 4 Proportion of susceptibility area

With regard to the susceptibility to rockfall, areas with very high susceptibility account for 0.02% of the study region and show dispersed spatial distribution. Areas with high susceptibility to rockfall account for 3.73% of the study region; these areas are identified mainly in the northeastern and central parts of the study region. Regions with medium susceptibility to rockfall account for 14.29% of the study region and these areas are found mainly in the central parts of the HMR. In order to further verify the reliability of the results, we compared the location of hazard points with that of the susceptibility to rockfall (Figs. 4c, 5c). We found a high degree of consistency between the location of hazard points and that of high susceptibility to rockfall, indicating the reliability of the susceptibility evaluation results.

4.3 Evaluation of Susceptibility to Geological Hazards as a Whole

By taking the average value of the susceptibility to landslides and rockfalls and grading with natural breakpoint method, we obtained the map of susceptibility to geological hazards as a whole across the study region (Fig. 6a). When compared with the mapping results of the susceptibility to rockfall and/or landslide, the map of the susceptibility to geological hazards as a whole can comprehensively show the characteristics of susceptibility to multiple hazards. These regions display a distinctly different spatial pattern. In addition, nine types of geological hazard susceptibility areas are delineated according to the susceptibility to landslides and rockfalls (Fig. 6b). Comparison of these two maps can benefit in-depth analysis of the characteristics of susceptibility to geological hazards. Areas with very high susceptibility to geological hazards account for 4.34% of the study region (Table 4), which is mainly distributed in the Three Parallel Rivers region, the central and the northeastern parts of the study area. Those areas with high geological hazard susceptibility account for 8.67% of the study region (Table 4), which are distributed in the central parts of the study region and the river valleys in the northeastern parts of the study region. Other areas with moderate susceptibility to the geological hazards account for 11.96% of the study region (Table 4), which are found mainly in the central parts of the study region and the Three Parallel Rivers region. The map also demonstrates that the spatial patterns of very high and high susceptibility to geological hazards are similar, indicating that large-scale landslides are more likely to occur in these areas. In addition, areas with high susceptibility to geological hazards are identical to those with moderate susceptibility to landslides and rockfalls, indicating the higher frequency of moderate-scale landslides and rockfalls in these areas. On the whole, in the central part of HMR and the Three Parallel Rivers region, massive areas with high landslide susceptibility and large areas with moderate susceptibility to rockfall are found, resulting in extremely high susceptibility to geological hazards and a high degree of risk exposure to geological hazards. These areas are also home to a large and high density population as well as a relatively high socioeconomic development level. In this sense, remarkable concerns should be attached to the human mitigation of geological disasters and effort to achieve geological disaster risk reduction.

Fig. 6
figure 6

Assessment of geological hazard susceptibility. a Susceptibility to geological hazards as a whole. b Susceptibility to multi-hazards

5 Discussions

In order to further investigate the characteristics of geological hazard susceptibility, we calculated and analyzed the driving factors behind susceptibility. Based on the mean decrease Gini and mean decrease accuracy indicators by the RF model, we calculated the contribution rate of each factor to geological hazard susceptibility (Fig. 7) in order to differentiate major driving factors. The mean decrease Gini is determined by calculating the influence of each variable on the heterogeneity of the observed susceptibility of geological hazards at each node of the classification tree; the mean decrease accuracy is determined by replacing the variable with a random number and comparing the degree of decline in the accuracy of the model to determine the importance of the factor to the susceptibility to geological hazards. In the landslide susceptibility assessment model, the results obtained by the two indicators are highly consistent. From the perspective of the mean decrease accuracy, driving factors behind the landslide susceptibility are elevation, curvature, precipitation, and distance from a fault. The CCR (cumulative contribution rate of the foremost four indicators) reaches as high as 75.16%, which means that topography, geological aspects, and climatic conditions have an important impact on the occurrence of landslide hazards. According to the mean decrease Gini, elevation has the highest contribution rate to landslide hazards, reaching as high as 23.49%. The four indicators with the highest contribution rate are elevation, precipitation, curvature, and slope, with a CCR of 66.28%. Among these factors, precipitation provides sufficient hydrodynamic conditions to trigger the occurrence of landslides and rockfall. In the rockfall susceptibility model, the four indicators with the highest contribution rate are precipitation, elevation, curvature, and slope. From the mean decrease accuracy, the CCR reaches 76.04%, while from the mean decrease Gini it is 71.2%. To sum up, elevation, slope, and precipitation are crucial factors for the susceptibility assessment of geological hazards such as landslides and rockfalls, which can provide a reference for the selection of indicators in geological hazard susceptibility evaluation.

Fig. 7
figure 7

Contribution rate of impact factors on landslide and rockfall susceptibility. a Mean decrease accuracy and b mean decrease Gini of landslide susceptibility model. c Mean decrease accuracy and d mean decrease Gini of rockfall susceptibility model. CCR is cumulative contribution rate of the foremost four indicators.

In this geological hazard susceptibility assessment, hazard points data collection is a basic and important work. The hazard points used in this study are sourced from statistical data provided by the local administrative units of each county and city, which records the occurrence of hazards within the jurisdiction of the county or city. But omissions of geological hazard events may exist. The China Institute for Geological Environmental Monitoring (CIGEM) has compiled a map of historical hazard points of landslide and rockfall, which shows a hazard distribution that is similar to our hazard susceptibility map (CIGEM 2011). Since geological hazard susceptibility is closely related to the historical record of hazard points, the results of the machine learning methods for hazard prediction are reasonably accurate.

Other than providing post-disaster relief, how to reduce exposure to landslides and rockfalls, especially in mountainous areas, has become a growing concern in recent years. With climate change, the problem would be even more challenging. To put this into perspective, implementing hazard susceptibility assessment and risk assessment before planning to effectively reduce the irrationality of construction and development is the key to tackle this challenge (Qiu 2014). For example, Sidle and Ziegler (2012) claimed that land management may pose potential problems without adequate road network planning and landslide hazard assessment. In the developing regions of Southeast Asia, there is an urgent demand to make full preparations to avoid potential disasters (Sidle and Ziegler 2012).

Chen et al. (2019) mapped the hazard-prone areas of landslide according to hazard frequency, which shows a similar distribution as does our landslide susceptibility map in the HMR area, but we further mapped the propensity for hazards of different scales. The multi-hazard map shows that the high landslide-high rockfall susceptibility areas are distributed in the northwest of Yunnan Province. This distribution can be partly explained by the huge elevation difference and intensive road construction in the area. Landslide rates in Southeast Asia sharply increased when development that used improper road design took place in steep-slope mountainous areas, especially in Yunnan, during the decade from 1970 to 1980. The landslides associated with roads were hundreds of times more frequent than those in the mountainous Pacific Northwest of the United States (Sidle and Ochiai 2006; Sidle et al. 2011). This is also an indirect evidence that supports our results.

Susceptibility assessment is also an essential part of risk assessment. Lin et al. (2017) assessed the empirical dependencies between the volume of landslides and their resulting casualties, and found that when the landslide volume is 0.1−50 × 104m3, the number of average casualties rise to approximately 10–20. Casualties increase even higher as larger volume landslides occur, indicating that the hazard scale is closely associated with loss of lives and economic losses. Susceptibility assessment is of great importance as a result, and further risk assessment studies need to be carried out in the future.

6 Conclusion and Findings

In this study, we adopted 10 factors in the assessment of susceptibility to the geological hazards of the Hengduan Mountains Region. Areas with different multi-hazard susceptibilities are identified, which can help to provide reference information for the risk assessment of geological hazards in the study region. Improved understanding of susceptibility to geological hazards can be conducive to the mitigation of geological disasters and sustainable development. We obtained the following findings:

  1. (1)

    Based on the RF model, the SVM model, and the BP neural network model, we developed and compared six machine learning models to screen out the optimal model for the evaluation of susceptibility to geological hazards. By comparing the prediction effects of these models, we found that the RF model has the highest modeling/prediction performance when compared to the other models considered in this study. The RF model also has the advantage of strong resistance to over-fitting (with an AUC value of 0.83) compared to the other models considered in this study. This finding can help to provide reference information for the selection of models in the evaluation of susceptibility to geological hazards in other regions of the globe.

  2. (2)

    We identified that areas with medium and high susceptibility to landslides and rockfalls are distributed in the central and southern parts of the Hengduan Mountains Region, and are distributed along the mountains in the northern parts of the HMR. The areas with very high susceptibility to landslides are found mainly in the eastern parts of the Tibet Autonomous Region, and along the Lancang River and the Jinsha River. The areas with very high susceptibility to rockfall have a sporadic spatial distribution. The result of prediction of the susceptibility to geological hazards is similar to the actual distribution of the geological hazard points in space. This demonstrates the reliable prediction performance of the machine learning model developed in this study.

  3. (3)

    The Three Parallel River region in the eastern part of Tibet is highly susceptible to geological hazards. Meanwhile, the areas with medium and high susceptibility to geological hazards are widespread in the middle part of the Hengduan Mountains Region. These areas should be given considerable attention since the HMR is dominated by a relatively high population density and a developed economy. Mitigation of geological hazard risks and scientific planning of human activities should be done based on the evaluation results of the susceptibility to geological hazards in the HMR. Moreover, the susceptibility assessment shows that topographic and climatic factors play a critical role in the occurrence of geological hazards.