Regional landslide hazard assessment through integrating susceptibility index and rainfall process

Due to the difference of the spatial and temporal distribution of rainfall and the complex diversity of the disaster-prone environment (topography, geological, fault, and lithology), it is difficult to assess the hazard of landslides at the regional scale quantitatively only considering rainfall condition. Based on detailed landslide inventory and rainfall data in the hilly area in Sichuan province, this study analyzed the effects of both rainfall process and environmental factors on the occurrence of landslides. Through analyzing environmental factors, a landslide susceptibility index (LSI) was calculated using multiple layer perceptron (MLP) model to reflect the regional landslide susceptibility. Further, the characteristics of rainfall process and landslides were examined quantitatively with statistical analysis. Finally, a probability model integrating LSI and rainfall process was constructed using logistical regression analysis to assess the landslide hazard. Validation showed satisfactory results, and the inclusion of LSI effectively improved the accuracy of the landslide hazard assessment: Compared with only considering the rainfall process factors, the accuracy of the landslide prediction model both considering the rainfall process and landslide susceptibility is improved by 3%. These results indicate that an integration of susceptibility index and rainfall process is essential in improving the timeliness and accuracy of regional landslide early warning.


Introduction
Landslides, defined as the downslope movement of rock, debris, or earth due to the influence of gravity, have caused significant property damages, personal injuries, or death worldwide (Abdallah and Faour 2017;Bui et al. 2017;Canli et al. 2018;Dowling and Santi 2014;Petley 2012;Segoni et al. 2018a, b). It is estimated that the annual economic loss caused by the landslide disaster in China is about $ 50 million (Sang 2013). Due to the significant impact of landslides on properties and human lives, the assessment of causal factors of landslide hazard is essential. Landslides can be triggered by various external events such as intense rainfall, earthquakes, water-level changes, storm waves, or human activities, of which rainfall is the most common and important triggering factor (Brunetti et al. 2018;Olyazadeh et al. 2017;Palenzuela et al. 2016;Rossi et al. 2017). Most rainfallinduced landslides occur in the middle and late stages of the rainfall process or with a lag of a few days (Hong et al. 2018;Lee et al. 2014;Mathew et al. 2014). Moreover, landslides are also related to the pre-precipitation conditions, and the impact to rainfall-induced landslides varies differently (Bogaard and Greco 2018;Cardinali et al. 2006;Gabet et al. 2004;Segoni et al. 2018a, b;Yin 2004). In order to understand the effects of rainfall on landslides, two main perspectives have been carried out. The first one examines the thresholds of rainfall (e.g., length, intensity) causing landslides. The second explores the relationship between landslides and rainfall process by constructing statistical models such as multivariate regression, neural network, and support vector machine (De Luca and Versace 2016; Guzzetti et al. 2007;Jiusheng et al. 2004;Marra 2019;Naidu et al. 2018). In the first perspective, two major thresholds of rainfalls have been identified as (1) intensity-duration (ID) relationships and (2) antecedent precipitation (AP) schemes. ID relationships provide critical values of rainfall intensity, which can cause slope failure when the critical value is reached or exceeded (Brunetti et al. 2010;Caine 1980;Guzzetti et al. 2008;Rosi et al. 2016); AP schemes define the critical value of rainfall events that accumulate in a short period of time (equivalent to hours or days before the occurrence of landslide), depending on the long-term accumulated antecedent precipitation (Aleotti 2004;Chleborad 2003;Crozier 1999;Glade et al. 2000;Kanungo and Sharma 2014;Lagomarsino et al. 2015). With the second perspective, probability models have been constructed to assess regional landslides hazards. The relationships between landslides and rainfall process have been examined using logistic regression, artificial neural networks, multivariate linear regression, etc. (Gorsevski et al. 2001;Li et al. 2008;Yin 2004).
Until now, most studies only consider the relationship between landslides and rainfall processes, such as daily rainfall and antecedent rainfall. Only a few studies have considered the coupling effect of rainfall processes and environmental factors (such as topography, landform, lithology, and fault) on the prediction of regional landslides Hong et al. 2007;Segoni et al. 2018a, b). However, these environmental factors may be crucial in affecting the occurrence of landslides. If these environmental factors are not taken into account, the accuracy of the developed model is questionable. Meanwhile, the structural relationship between rainfall processes and landslides is sensitive to different environments. In order to solve this problem, we analyze the impact on landslide occurrence with the coupling effect of landslide susceptibility and the rainfall process to improve the accuracy of regional landslides hazard assessment. In this paper, we first develop a landslide susceptibility index (LSI) using the multiple layer perceptron (MLP) model to comprehensively analyze environmental factors. Then, the impact of rainfall on landslides is analyzed by calculating the effective rainfall based on the detailed landslide inventory and rainfall data. Finally, the integrated impact of LSI and rainfall processes on landslides is examined using a logistic regression model to evaluate the landslide hazard in hilly area of Sichuan province.
To examine the impact of environmental factors and rainfall processes on the occurrence of landslides, we have developed a procedure that includes (1) use multiple layer perceptron (MLP) to assess landslide susceptibility, (2) use statistical analysis to examine the relationship between landslides and rainfall processes, and (3) use logistic regression model to evaluate rainfall-induced landslide hazards. The workflow of this study is shown in Fig. 1.

Study area
In this study, the hilly area located in the eastern part of Sichuan province, China, is selected (Fig. 2). This site is located between the latitudes of 27.9°-32.4°N and longitudes of 102.9° ~ 108.1°E, with the minimum elevation of 197 m and maximum elevation of 2234 m. The geographic area is 112,000 km 2 , accounting for approximately 23% of Sichuan province. The study area belongs to the subtropical humid monsoon climate zone, with an annual precipitation exceeding 1000 mm, and rainfall (mainly rainstorm) mostly occurs in summer.
During the rainy season, landslides are very frequent. Meanwhile, this is also the area with high population density and economic development. The dense landslides lead to huge losses (Tang 1994). Affected by special topography, geology and meteorological

Landslides data
The rainfall-induced landslide data in the study area are obtained from a field investigation conducted by China Geological Environmental Monitoring Institute in 2007 (Table 1). With the field investigation, a total of 648 landslides are identified, and the date of occurrence, inducing factors, and locations of the landslides are recorded. A portion of landslides data is shown in Table 1.
Through analysis from the information of 648 landslides, it is found the landslides in the study area are mainly shallow and medium-sized landslides, and the predominant factor is mainly continuous rainfall.

Rainfall data
The daily rainfall data used in this paper are provided by Sichuan Meteorological Data Service Center (https ://www.clima te.sc.cn/gx/home.do). In the study area, there exist 62 meteorological stations, and their spatial distributions are shown in Fig. 3. Based on the nearest distance principle, we associate each landslide with the nearest meteorological station and retrieve the precipitation data on the same day when the landslide occurred.

Methods
As mentioned in Fig. 1, two critical methods are adopted in this study: multiple layer perceptron (MLP) and logistic regression model. The formation processes of the landslides are a complex physical process. It can be considered as a nonlinear system affected by both predisposing environments (landform, geologic structure, lithology, etc.) and triggering factors (rainfall, earthquake, etc.). The MLP model is based on the back-propagation learning algorithm and has the ability to model nonlinear systems, which meets the complicated characteristics of landslides disaster systems. Therefore, it is suitable for the regional landslides susceptibility assessment (Cong 1998;Moayedi et al. 2019;ThaiPham et al. 2017;Zare et al. 2012). Logistic regression model is used to build the probability model of landslides occurrences in the study area.

• MLP
In order to solve the problem of multi-category decomposition of nonlinear fractional data, multiple layer perceptron (MLP) model is proposed. MLP is a multilayer feedforward network with one-way error propagation. As one of the most widely used and deeply studied network models, MLP has the prominent feature of high nonlinear mapping ability and has played an important role in image processing, pattern recognition, algorithm optimization, function approximation and self-adaptation (Ling 2004).
The structure of MLP is composed of nodes in different layers. The nodes in each layer are related to the next layer and output to the next layer. The output layer is amplified, attenuated, or suppressed by connecting values. In addition to the output layer, the excitation output value of each node is determined by the input node, excitation function, and bias value. The trained network can form a nonlinear mapping relationship between pattern space and classification space. • Logistic regression model Logistic regression model is a statistical analysis method that is often used on the regression analysis to binary categorical variables. Unlike a linear ordinary least square (OLS) regression, logistic regression is a nonlinear regression model whose parameters are estimated using the maximum likelihood method. It is proved that under the condition of random samples, the maximum likelihood estimation of the logistic regression model is consistent, progressive, and normal. Among the various statistical methods proposed in the landslide hazard assessment, logistic regression analysis has proved that it is a well-used and established technique (Ayalew and Yamagishi 2005;Chen and Wang 2007;Hemasinghe et al. 2018;Lee and Pradhan 2007).
When building a regional rainfall-induced landslide probability model, the dependent variable is often with a binary structure, that is, the landslide "occurs" or "does not occur," in accordance with the requirements of the logistic regression model. Meanwhile, the logistic regression model has a complete set of criteria for testing models structures and parameters and provides the probability of landslide occurrence in the form of regression values from 0 to 1.
Logistic regression analysis correlates the probability of landslide occurrence (values from 0 to 1) with logarithm (where indicates a higher probability of non-occurrence and indicates a higher probability of occurrence) (Hemasinghe et al. 2018). In logistic regression analysis, the "u" is assumed as a linear combination of independent variables, and formula (1) is given as follows: where output P is the probability of a landslide occurrence and u is the independent variable which is a linear combination of the influencing factors (slope, geology, rainfall, and land cover). In formula (2), β 0 , β 1 , ..., β n are corresponding coefficients of each contribution factor indicating their contribution to the occurrences of landslides.
The distribution of factors selected for landslide susceptibility assessment is shown in Fig. 4.
In order to build the model, the factors should be preprocessed, and the results are shown in Table 3.
In this study, historical landslides (Fig. 5) used for landslide susceptibility assessment are obtained through field investigations and interpretations from high-resolution remote sensing images.
Based on the distribution of historical landslides and the factors selected for susceptibility assessment, we proposed an MLP model to assess landslide susceptibility in the study area (Wang et al. 2015). In order to obtain reliable results, on the basis of summarizing previous studies and combining the actual situation of the study area, this paper adopts the method of selecting positive samples and negative samples at the same time for sample selection.
Randomly select 80% of the 375 landslide data, that is, 300 landslides (positive samples), and an equivalent of 300 stable slopes (negative samples) as the training samples of the model. The model validation sample utilizes 20% of landslide data and the same number of stable slopes, namely 75 landslides (positive samples) and 75 non-landslides (negative samples). The rainfall-induced landslide susceptibility index is calculated as shown in Fig. 6.
Based on the landslide susceptibility index, the natural breaks method is used to divide the study area into three susceptibility zones: high, middle, and low, as shown in Fig. 7.  Through statistical analysis, it can be found that the distribution of landslides in the study area is closely related to the susceptibility partitioning. As shown in Fig. 8, 45.1% of the landslides are located in the high-susceptibility area, covering an area of 28. 754 km 2 , which is only 25.7% of the total area; 26.4% of the landslides are in the moderate-susceptibility area, which covers an area of 34. 917 km 2 , accounting for 31.2% of the total area; and only 28.6% of the landslides area are in the low-susceptibility area, with an area of 48. 312 km 2 , accounting for 43.1% of the total area.
In Fig. 8, it is also indicated that landslide susceptibility assessment is a comprehensive assessment of the landform, lithology, geological structure, etc., that are susceptible. It reflects the impact of disaster environment factors on landslides in the study area. Therefore, it is necessary to consider landslide susceptibility index when building the landslide probability model.

Examination of the relationship between landslides and rainfall process
In order to determine the relationship between the landslides and rainfall process (daily rainfall, antecedent rainfall, etc.) in the study area, a detailed statistical analysis based on the landslides inventory has been carried out. Through the analysis, it can be found that 72% of landslides occur on the day when the precipitation is greater than 10 mm, and 94% of landslides Fig. 8 The relationship between landslides and susceptibility level Fig. 9 Relationship between rainfall and landslides occurrence occur within 3 days after heavy rainfall. As time goes on, the number of landslides decreases rapidly.
It can be considered that the changes reflected in Fig. 9 represent the contribution of the antecedent rainfall to the occurrence of landslide. The contribution can also be called the weight coefficient of rainfall to landslides. The univariate negative exponential function is used to fit the relationship between the days before the occurrence of landslides and the probability of landslides (Eq. 3 and Fig. 10).
Based on the curve shown in Fig. 10, Eq. (3) can be obtained by fitting the function: where x represents the days before the landslides occurred in the study area and f(x) represents the probability of landslides. The curve fitting accuracy of Eq. (3) is high with a R 2 of 0.997. Equation (3) reflects the relationship between the landslides occurrence and time of rainfall accompanied by different intensities of rainfall infiltration to the slope. It helps to deepen the understanding of the formation mechanism of rainfall-induced landslides in the study area. From the statistical analysis, it is found that the rainfall-induced landslides in the study area are closely related to the daily rainfall and the rainfall of the first 5 days in the hilly area of Sichuan province. Therefore, the days after a major rainfall can be considered as the factors involved in the construction of probability model which is used to evaluate the landslides hazard in the study area. Based on Eq. (3), the corresponding attenuation coefficient is calculated as shown in Table 4.
The calculation method of effective rainfall in the study area is shown in Eq. (4) based on the attenuation coefficient shown in Table 4. where f(x) is the attenuation function; R 0 is the daily rainfall; and R n is the previous daily precipitation (1≤ n ≤5).
Based on Eq. (4), the effective rainfall of daily rainfall and first 1, 2, 3, 4, and 5 days is calculated as shown in Table 5.
The effective rainfall, as the representative factor of the rainfall process affecting landslides, is used to build the probability model of landslides occurrence.

Model construction
Based on the rainfall process and landslides susceptibility index, a logistic regression algorithm is adopted to establish the rainfall-induced probability model. Table 6 shows the process of stepwise regression. According to the standard significance level of 0.05, we have found the effective rainfall before 2 and 4 days (Day 2 and Day 4 ) has no significant relationship with the landslides occurrence, while Day 0 , Day 3 , Day 5 , and LSI have significant relationship with the landslides occurrence. The insignificance of Day 2 and Day 4 may be due to the correlation with other variables. As shown in Table 6, Day0, Day3, Day5, and LSI are selected to construct a logistic regression model. The rainfall-induced landslides probability model in the hilly area of Sichuan province is shown in Eq. (5).
where P represents the probability of the landslides occurrence, LSI represents the landslide susceptibility index, r 0 represents the effective rainfall in the day, and r 3 and r 5 represent the effective rainfall before 3 and 5 days respectively.

Model validation
To analyze the accuracy of the probability model, the classification table is used to quantitatively analyze the simulation results of the model based on the qualitative analysis shown in Table 7. From Table 7, we can also find that the prediction results are strongly dependent on the threshold value of the LSI that is used to consider a pixel as stable or unstable.

Results
Based on the above analysis, this article has achieved the following results.
(1) In the study area, the landslide susceptibility index comprehensively reflects the impact of various disaster environment factors (landform, lithology, geological structure, etc.) on the distribution of landslides. The density of the landslides is significantly raised with the increase in the susceptibility level.
(2) The occurrence of landslides in the study area is closely related to the rainfall on the day and the first five days. The relationship can be fitted by negative exponential function of one variable, and this equation contributes to deepening the understanding of the mechanism of landslides induced by rainfall in the study area, which approximately reflects the relationship between the time and the landslides occurrence by infiltration into slope under different intensities of rainfall. (3) Based on the effective rainfall factor and landslide susceptibility index, the logistic regression model is utilized to build the probability model of rainfall-induced landslides. Through the conditional likelihood ratio backward elimination, it is found the effective rainfall on the day (Day 0 ), before 3 days (Day 3 ), before 5 days (Day 5 ), and LSI had significant relationship with landslides. The AUC value is 0.86; therefore, the prediction accuracy of the model is good.

Discussion
In order to analyze the influence of landslide susceptibility index on the accuracy of the landslide probability model, Cox and Snell R 2 and Nagelkerke R 2 are used to test the goodness of fit of logistic regression equations. From Table 8, the Cox and Snell R 2 considering the susceptibility index is 0.351 and the Nagelkerke R 2 is 0.467, which are significantly higher than the corresponding value without considering the susceptibility index factor. It can be explained that the substitution of landslide susceptibility index factor effectively improves the model's goodness of fit. Meanwhile, the ROC curve method is utilized to compare the accuracy changes before and after considering the landslides susceptibility index in the landslide probability model.
As shown in Fig. 11, the rainfall-induced landslides probability model constructed has a high value of "diagnosis" value, because the ROC curve area of the model considering the landslide susceptibility index is 0.86; in contrast, the accuracy of the rainfall-induced landslides probability model without considering the landslide susceptibility index is verified by the ROC curve, and the AUC is 0.83. It can be found that the accuracy of the rainfallinduced landslides probability model is improved by 3% by considering the landslide susceptibility index.
In addition, the mechanism of causative factors (rainfall, earthquake, human activities, etc.) affecting the stability of landslides with quantitative method has been the difficulty in regional landslide temporal prediction and will retain an important direction in future research. The innovation of this paper is that through the statistical analysis of relationship between the landslides occurrences and rainfall processes in this study, the weight coefficient function reflecting the rainfall contribution to landslide was obtained. This function approximately reflects the occurrences of landslides and rainfall with different intensities infiltrating into the slope body, which is helpful to deepen the understanding of the rainfallinduced landslide's formation mechanism in the hilly area of Sichuan province. Based on the function, a more accurate probability model can be built to evaluate the landslides hazard in the study area. Disaster risk monitoring plays an increasingly important role in the emergency management and is an important measure to strengthen the sustainable development of society, economy, and environment. This paper establishes a probability model of the landslides occurrences in the hilly area of Sichuan province and uses this model to dynamically assess regional landslide hazards in the coming days based on the rainfall forecast data and rainfall real-time monitoring data with map algebra. Combined with the vulnerability assessment results of the disaster bodies in the study area, it is possible to assess the risk of regional landslides in the next few days. The research results can be directly applied to the landslide risk monitoring work in the study area, and the research ideas and methods have important reference value for the landslide hazard assessment and risk monitoring in other geology disaster-prone areas.
However, there are some limitations in this study. Detailed recording of landsides occurrence data (location and time) and the corresponding rainfall data are essential for regional landslide hazard assessment. Based on the statistical analysis, this paper obtains the rainfall weight coefficient function for the hilly area in Sichuan province with the 648 landslide records in the study area and the corresponding rainfall data in 2007. Due to the relatively small amount of the available sample data, the universality of the weight coefficient function obtained above needs to be further verified.