Machine Learning-Based Evaluation of Susceptibility to Geological Hazards in the Hengduan Mountains Region, China

The Hengduan Mountains Region (HMR) is one of the areas that experience the most frequent geological hazards in China. However, few reports are available that address the geological hazard susceptibility of the region. This study developed six machine learning models to assess the geological hazard susceptibility. The results show that areas with medium and high susceptibility to geological hazards as a whole account for almost 21% of the total area, while both are 18% when it comes to the single hazard of landslide and rockfall respectively. Medium and high geological hazard susceptibility is found in three parts of the HMR with different characteristics: (1) the central and southern parts, where the population of the region concentrates; (2) the northern part, where higher geological hazard susceptibility is found along the mountain ranges; and (3) the junction of Tibet, Yunnan, and Sichuan in the eastern part, which is prone to larger-scale geological hazards. Of all the potential influencing factors, topographic features and climatic variables act as the major driving factors behind geological hazards and elevation, slope, and precipitation are crucial indicators for geological hazard susceptibility assessment. This study developed the geological hazard susceptibility maps of the HMR and provided information for the multi-hazard risk assessment and management of the region.


Introduction
Geological hazards refer to geodynamic activity or phenomena that are formed under the interactions of natural or human factors, which cause losses to human life and property and damage to the environment, such as landslides, rockfalls, and so on (Huang et al. 2020;Lin et al. 2021). More than 90,000 landslides occurred in southern and northwestern China in 1949China in -2011 risks to tens of millions of people and causing billions of economic losses to properties (Liu et al. 2013). Besides, warming temperatures, and particularly rapid winter warming episodes, accelerate the melting process of the permafrost (Zhang et al. 2019), which may also intensify geological hazards such as landslides. Therefore, geological hazard susceptibility assessment has become a challenging task and also an urgent demand.
Geological hazard susceptibility is recognized as the likelihood of hazard occurrence in a given location (Fell et al. 2008;Saha et al. 2020;Samia et al. 2020) and is closely related to the characteristics of the geographical elements (Dahal et al. 2008a(Dahal et al. , 2008bPourghasemi et al. 2012). Because of the complicated interactions between the various driving factors of susceptibility, it is also a challenging task to quantify the susceptibility of geological hazards (Fell et al. 2008). Although building an indicator system and assigning weights to calculate an evaluation index has become the most widely used method for risk and susceptibility assessment, it is often criticized for its subjectivity (Hadmoko et al. 2010;Hadji et al. 2013;Niu et al. 2015;Huang et al. 2020). The development of machine learning technologies provides alternative solutions for this issue. By analyzing the inherent geographic characteristics of potential hazard-affected sites (for example, topography, elevation, lithology, precipitation, and so on), machine learning models can detect hazards within a certain range and predict their distribution. Among all the machine learning algorithms, the random forest (RF), support vector machine (SVM), and backpropagation (BP) neural network models have been widely used in the field of disaster modeling (Lee et al. 2007;Chen et al. 2009;Yilmaz 2009;Bednarik et al. 2012;Catani et al. 2013;Huang et al. 2020) because of their high prediction performance (Cutler et al. 2007). Thus, by comparing the results of the three algorithms with the receiver operating characteristic curve (ROC), we can select the best regional model for a susceptibility assessment.
The Hengduan Mountains Region (HMR) covers an area of more than 600,000 km 2 characterized overall by a relatively dry climate, which is varied spatially and seasonally by zones of temperate monsoon, subtropical monsoon, and alpine plateau climates. The complex topographic features and climate conditions in the HMR render this region extremely susceptible to large-scale geological hazards (Huang 2009;Li et al. 2011;Liu et al. 2013;Huang et al. 2014;Xu et al. 2019). Although single hazard analysis in isolation is increasingly difficult to simultaneously meet the demand of a full evaluation of risks and the risk reduction of natural hazards (Peduzzi et al. 2009;Koks et al. 2019;Pourghasemi et al. 2020;Pouyan et al. 2021), few studies are available that address this issue. This is especially true in the HMR, which is not conducive to hazard by hazard land use planning and disaster risk management in this complex, multi-hazard region. Therefore, evaluation of the susceptibility of the HMR to geological hazards is challenging but also constitutes the most significant potential contribution of this current study.
The main objective of this study is to evaluate the susceptibility of the HMR in China (Fig. 1) to geological hazards, which only include two types of hazards of landslides and rockfalls in this study. They have been selected as representative hazards because they have high occurrence frequencies, large disaster-affected areas, complex driving factors, and massive damages (Huang 2009). Six models were developed from three machine learning algorithms. Then we compared the modeling performance of these algorithms and screened out the most effective one to identify the regions with high susceptibility to geological hazards and map the susceptibility of the HMR to geological hazards. This study provides a theoretical framework for the evaluation of the susceptibility of mountainous regions to multiple geological hazards.

Data
The data we analyzed in this study include geological hazard data with longitude and latitude of the hazard sites (''hazard points'' in the subsequent sections) and driving factors of geological hazards. The hazard points were used for training the machine learning models, while the driving factors were used to implement susceptibility assessment.

Geological Hazard Inventory
Through on-the-spot visits, we obtained the hazard inventory from the Natural Resources Bureaus of 11 cities in Sichuan Province, Tibet Autonomous Region, and Yunnan Province. This inventory records the location and scale of geological hazards, covering 9980 hazard points, wherein 7159 hazard points are for landslides and 2821 for rockfalls. Although these hazard points are not all located in the HMR but are distributed in the HMR and its immediately surrounding areas, they are all related to similar driving factors. Therefore, these records were incorporated to form our hazard inventory (Fig. 2).
To get better results by training the machine learning models, 2080 non-rockfall points and 5500 non-landslide points within the HMR were randomly selected while avoiding a 500-meter buffer zone along the rivers and roads, and around existing hazard points in the inventory. By adding them to the hazard inventory, the number of non-hazard points and hazard points is basically the same.

Geological Hazard Driving Factors
Examples of existing research that address the driving factors behind geological hazard assessment, which we consulted, include Lee and Min (2001), Süzen and Doyuran (2004), Floris et al. (2011), Erener andDüzgün (2013), Mandal and Maiti (2015), Shahabi and Hashim (2015), Strehmel et al. (2015), and Hou et al. (2016). In this study, we adopted 10 factors for geological hazard susceptibility evaluation, including elevation, slope, aspect, curvature, lithology, distance from geological fault, precipitation, Normalized Difference Vegetation Index (NDVI), distance from the river, and distance from road (Table 1), of which the first eight were used for model training and assessment of susceptibility to geological hazards. The latter two were mainly used to identify nongeological hazard points. Our digital elevation model (DEM) is sourced from the ASTER GDEM data set jointly developed by the National Aeronautics and Space Administration (NASA) of the United States, the Japanese Ministry of Economy, Trade and Industry (METI), and the Japan Space Agency, with a spatial resolution of 30 m. 1 The slope, aspect, and curvature data were all processed from the DEM using the spatial analysis toolbox of ArcGIS. Lithology and the distance from the geological faults demonstrate geological structure. Hydrometeorological indicators include precipitation, NDVI, and distance from rivers, and the human activity indicator refers to the distance from roads. The precipitation data are based on a set of reanalysis data sets released by the National Qinghai-Tibet Plateau Science Data Center, 2 the data set covers the period of 1979-2015 with a spatial resolution of 0.1°. We calculated the average precipitation using the R software package and interpolated the data to 500 m. The distribution of rivers and roads is sourced from OpenStreetMap. 3 We removed artificial irrigation canals and country roads but kept the main roads for the analysis.

Methods
The scale of the geological hazards, such as the size and/or volume of the rockfalls and landslides, was taken as the basis for the susceptibility assessment and the susceptibility to the geological hazards was subdivided into 4 levels ( Table 2) as suggested by the Specification of Comprehensive Survey for Landslide, Rockfall and Debris Flow (1:50,000) issued by China's Ministry of Land and Resources (China 2014). About 70% of the hazard points were randomly screened out from each level to form the training data set, while the rest form the validation data set. Six models were developed from three types of machine learning methods, that is, random forest (RF) model, support vector machine (SVM) model with different nuclear

Random Forest (RF) Model
The random forest model is an ensemble machine learning model and was developed from decision trees. Modeling based on the RF model involves four steps (Huang et al. 2020): (1) Take k samples from the training sample D to build a random tree.
(2) Randomly extract m features from each sample, and select the best feature as the growth node. (3) Let each tree grow as much as possible without pruning. (4) Repeat the above steps continuously to form a random forest.
In addition, in the modeling by the RF model, the mean decrease Gini and the mean decrease accuracy index can be used to quantify the importance of each influencing factor.

Support Vector Machine (SVM) Model
The SVM is a linear classifier, which maximizes the difference between categories of different features by projecting the feature points into the multi-dimensional feature space so as to find the optimal n-dimensional hyperplane to classify the data, and therefore transforms the classification problem into a solution to a convex secondary planning problem (Eqs. 1-2) (Yao et al. 2008;Xu et al. 2012;Pourghasemi et al. 2013). In essence, the SVM model is a two-classification model and can also realize the multiclassification of data.
One of the important parameters in the SVM model is the kernel function, which mainly includes: linear, polynomial, radial (RBF), and sigmoid (Eqs. 3-6). We developed four kinds of SVM models according to different kernel functions for the evaluation of the susceptibility to geological hazards.
Sigmoid kernel KðX i ; where c, d, and r are the relevant parameters in each kernel function.

Back Propagation (BP) Neural Network Model
The BP neural network model is a multilayer feedforward network that incorporates an error backpropagation algorithm . The model includes a three-layer structure: namely an input layer, a hidden layer, and an output layer. Each structure contains multiple nodes, and the hidden layer can be set with multiple layers of nodes. Different nodes are connected by lines and assigned weights. Compared with ordinary neural network algorithms, the BP neural network model is characterized by the backpropagation of errors. The calculation process mainly includes forward calculation and error adjustment. The main process is as follows (Wang et al. 2005): The output of hidden layer node: The output of output layer node: Error function: The adjustment of weight: Weight adjustment: where h h and c j are the thresholds of the hidden layer node and the output layer node, respectively; f is the transfer function; g is the learning step; and P is the number of samples, k = 1, …, P.

Model Validation
The predictive accuracy of the machine learning models is evaluated by the receiver operating characteristic (ROC) curves and area under the curve (AUC) values (Gorsevski et al. 2006;Trigila et al. 2015). The accuracy of model classification consists of four types of results: true positive class (TP), false positive class (FP), false negative class (FN), and true negative class (TN) ( Table 3). The true positive class and the false negative class represent the correct classification.

Results
In this section, we first present the ROC curves of the six models to show their accuracy and choose the most suitable model for further application; then the susceptibility to landslides, rockfalls, and geological hazards as a whole is assessed, and the susceptibility is mapped. Figure 3 illustrates the ROC curves and AUC values of the rockfall and landslide hazards respectively. Overall, for the susceptibility to the rockfall hazard, the SVM model with sigmoid kernel function has a poor modeling performance, while the training results of other models are satisfactory with AUC above 0.8. Wherein, the RF model obtains the highest AUC of 0.88. As for the susceptibility to landslides, most models have AUC values between 0.7 and 0.8, of which the SVM model with sigmoid kernel function performs the worst, with an AUC value of 0.59, while the RF model performs the best with an AUC value of 0.83. In general, the RF model is the best choice for susceptibility assessment to the geological hazards.

Evaluation of Susceptibility to Rockfall and Landslide
Figures 4 and 5 demonstrate the spatial pattern of the susceptibility to rockfall and landslide respectively. To show the connections between altitude and slope and geological hazards, we overlay geological hazard points with our DEM (Digital Elevation Model). It is evident that the hazard points are mainly distributed along the valleys, especially the Jinsha River valley where we observe a concentration of multiple large-scale landslide points. River valleys are usually characterized by steep slopes vulnerable to erosion and erosion process by river flow also cause instability of riverbeds; all of these factors can trigger higher occurrences of geological hazards such as rockfalls and landslides. We can observe a more dispersed distribution of large-scale landslides in the southern and central areas of Sichuan Province. We also mapped the susceptibility to rockfall and landslide respectively (Figs. 4a, 5a). In general, medium and high susceptibility to landslide and rockfall is identified mainly across the central and southern HMR and is specifically distributed along the mountains.
With regard to susceptibility to landslides, areas with a very high susceptibility account for 0.18% of the study   (Huang 2009). The areas with high and medium susceptibility to landslides account for 3.27% and 14.28% respectively of the study region as a whole (Table 4). These areas are identified mainly in the central parts of the study region.
With regard to the susceptibility to rockfall, areas with very high susceptibility account for 0.02% of the study region and show dispersed spatial distribution. Areas with high susceptibility to rockfall account for 3.73% of the study region; these areas are identified mainly in the northeastern and central parts of the study region. Regions with medium susceptibility to rockfall account for 14.29% of the study region and these areas are found mainly in the central parts of the HMR. In order to further verify the reliability of the results, we compared the location of hazard points with that of the susceptibility to rockfall (Figs. 4c, 5c). We found a high degree of consistency between the location of hazard points and that of high susceptibility to rockfall, indicating the reliability of the susceptibility evaluation results.

Evaluation of Susceptibility to Geological Hazards as a Whole
By taking the average value of the susceptibility to landslides and rockfalls and grading with natural breakpoint method, we obtained the map of susceptibility to geological hazards as a whole across the study region (Fig. 6a). When compared with the mapping results of the susceptibility to rockfall and/or landslide, the map of the susceptibility to geological hazards as a whole can comprehensively show the characteristics of susceptibility to multiple hazards. These regions display a distinctly different spatial pattern. In addition, nine types of  geological hazard susceptibility areas are delineated according to the susceptibility to landslides and rockfalls (Fig. 6b). Comparison of these two maps can benefit indepth analysis of the characteristics of susceptibility to geological hazards. Areas with very high susceptibility to geological hazards account for 4.34% of the study region (Table 4), which is mainly distributed in the Three Parallel Rivers region, the central and the northeastern parts of the study area. Those areas with high geological hazard susceptibility account for 8.67% of the study region (Table 4), which are distributed in the central parts of the study region and the river valleys in the northeastern parts of the study region. Other areas with moderate susceptibility to the geological hazards account for 11.96% of the study region (Table 4), which are found mainly in the central parts of the study region and the Three Parallel Rivers region. The map also demonstrates that the spatial patterns of very high and high susceptibility to geological hazards are similar, indicating that large-scale landslides are more likely to occur in these areas. In addition, areas with high susceptibility to geological hazards are identical to those with moderate susceptibility to landslides and rockfalls, indicating the higher frequency of moderate-scale landslides and rockfalls in these areas. On the whole, in the central part of HMR and the Three Parallel Rivers region, massive areas with high landslide susceptibility and large areas with moderate susceptibility to rockfall are found, resulting in extremely high susceptibility to geological hazards and a high degree of risk exposure to geological hazards. These areas are also home to a large and high density population as well as a relatively high socioeconomic development level. In this sense, remarkable concerns should be attached to the human mitigation of geological disasters and effort to achieve geological disaster risk reduction.

Discussions
In order to further investigate the characteristics of geological hazard susceptibility, we calculated and analyzed the driving factors behind susceptibility. Based on the mean decrease Gini and mean decrease accuracy indicators by the RF model, we calculated the contribution rate of each factor to geological hazard susceptibility (Fig. 7) in order to differentiate major driving factors. The mean decrease Gini is determined by calculating the influence of each variable on the heterogeneity of the observed susceptibility of geological hazards at each node of the classification tree; the mean decrease accuracy is determined by replacing the variable with a random number and comparing the degree of decline in the accuracy of the model to determine the importance of the factor to the susceptibility to geological hazards. In the landslide susceptibility assessment model, the results obtained by the two indicators are highly consistent. From the perspective of the mean decrease accuracy, driving factors behind the landslide susceptibility are elevation, curvature, precipitation, and distance from a fault. The CCR (cumulative Fig. 6 Assessment of geological hazard susceptibility. a Susceptibility to geological hazards as a whole. b Susceptibility to multi-hazards contribution rate of the foremost four indicators) reaches as high as 75.16%, which means that topography, geological aspects, and climatic conditions have an important impact on the occurrence of landslide hazards. According to the mean decrease Gini, elevation has the highest contribution rate to landslide hazards, reaching as high as 23.49%. The four indicators with the highest contribution rate are elevation, precipitation, curvature, and slope, with a CCR of 66.28%. Among these factors, precipitation provides sufficient hydrodynamic conditions to trigger the occurrence of landslides and rockfall. In the rockfall susceptibility model, the four indicators with the highest contribution rate are precipitation, elevation, curvature, and slope. From the mean decrease accuracy, the CCR reaches 76.04%, while from the mean decrease Gini it is 71.2%. To sum up, elevation, slope, and precipitation are crucial factors for the susceptibility assessment of geological hazards such as landslides and rockfalls, which can provide a reference for the selection of indicators in geological hazard susceptibility evaluation.
In this geological hazard susceptibility assessment, hazard points data collection is a basic and important work. The hazard points used in this study are sourced from statistical data provided by the local administrative units of each county and city, which records the occurrence of hazards within the jurisdiction of the county or city. But omissions of geological hazard events may exist. The China Institute for Geological Environmental Monitoring (CIGEM) has compiled a map of historical hazard points of landslide and rockfall, which shows a hazard distribution that is similar to our hazard susceptibility map (CIGEM 2011). Since geological hazard susceptibility is closely related to the historical record of hazard points, the results of the machine learning methods for hazard prediction are reasonably accurate.
Other than providing post-disaster relief, how to reduce exposure to landslides and rockfalls, especially in mountainous areas, has become a growing concern in recent years. With climate change, the problem would be even more challenging. To put this into perspective, implementing hazard susceptibility assessment and risk assessment before planning to effectively reduce the irrationality of construction and development is the key to tackle this challenge (Qiu 2014). For example, Sidle and Ziegler (2012) claimed that land management may pose potential problems without adequate road network planning and landslide hazard assessment. In the developing regions of Southeast Asia, there is an urgent demand to make full preparations to avoid potential disasters (Sidle and Ziegler 2012). Chen et al. (2019) mapped the hazard-prone areas of landslide according to hazard frequency, which shows a similar distribution as does our landslide susceptibility map in the HMR area, but we further mapped the propensity for hazards of different scales. The multi-hazard map shows that the high landslide-high rockfall susceptibility areas are distributed in the northwest of Yunnan Province. This distribution can be partly explained by the huge elevation difference and intensive road construction in the area. Landslide rates in Southeast Asia sharply increased when development that used improper road design took place in steep-slope mountainous areas, especially in Yunnan, during the decade from 1970 to 1980. The landslides associated with roads were hundreds of times more frequent than those in the mountainous Pacific Northwest of the United States (Sidle and Ochiai 2006;Sidle et al. 2011). This is also an indirect evidence that supports our results.
Susceptibility assessment is also an essential part of risk assessment. Lin et al. (2017) assessed the empirical dependencies between the volume of landslides and their resulting casualties, and found that when the landslide volume is 0.1-50 9 10 4 m 3 , the number of average casualties rise to approximately 10-20. Casualties increase even higher as larger volume landslides occur, indicating that the hazard scale is closely associated with loss of lives and economic losses. Susceptibility assessment is of great importance as a result, and further risk assessment studies need to be carried out in the future.

Conclusion and Findings
In this study, we adopted 10 factors in the assessment of susceptibility to the geological hazards of the Hengduan Mountains Region. Areas with different multi-hazard susceptibilities are identified, which can help to provide reference information for the risk assessment of geological hazards in the study region. Improved understanding of susceptibility to geological hazards can be conducive to the mitigation of geological disasters and sustainable development. We obtained the following findings: (1) Based on the RF model, the SVM model, and the BP neural network model, we developed and compared six machine learning models to screen out the optimal model for the evaluation of susceptibility to geological hazards. By comparing the prediction effects of these models, we found that the RF model has the highest modeling/prediction performance when compared to the other models considered in this study. The RF model also has the advantage of strong resistance to over-fitting (with an AUC value of 0.83) compared to the other models considered in this study. This finding can help to provide reference information for the selection of models in the evaluation of susceptibility to geological hazards in other regions of the globe.
(2) We identified that areas with medium and high susceptibility to landslides and rockfalls are distributed in the central and southern parts of the Hengduan Mountains Region, and are distributed along the mountains in the northern parts of the HMR. The areas with very high susceptibility to landslides are found mainly in the eastern parts of the Tibet Autonomous Region, and along the Lancang River and the Jinsha River. The areas with very high susceptibility to rockfall have a sporadic spatial distribution. The result of prediction of the susceptibility to geological hazards is similar to the actual distribution of the geological hazard points in space. This demonstrates the reliable prediction performance of the machine learning model developed in this study. (3) The Three Parallel River region in the eastern part of Tibet is highly susceptible to geological hazards. Meanwhile, the areas with medium and high susceptibility to geological hazards are widespread in the middle part of the Hengduan Mountains Region. These areas should be given considerable attention since the HMR is dominated by a relatively high population density and a developed economy. Mitigation of geological hazard risks and scientific planning of human activities should be done based on the evaluation results of the susceptibility to geological hazards in the HMR. Moreover, the susceptibility assessment shows that topographic and climatic factors play a critical role in the occurrence of geological hazards.
Acknowledgements This research has been financially supported by the China National Key R&D Program, Grant No. 2019YFA0606900 and by the Scientific Investigation of the Qinghai-Tibet Plateau, Grant No. 2019QZKK0900. We would like to extend our cordial gratitude to the editors, Dr. Ying Li, Dr. Juan Du and the anonymous reviewers for their pertinent and professional comments and suggestions that were greatly helpful for further quality improvement of this manuscript.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.