1 Introduction

Assessing the occurrence of shallow rainfall-induced landslides is crucial for engaging in effective short-term and long-term risk protection actions.

Landslide early warning systems (LEWSs) are non-structural, cost-effective tools aimed at mitigating landslide risk that can be designed and used at different scales or resolutions: local systems deal with a single landslide system at slope scale, territorial systems deal with multiple landslides at regional scale, i.e., over a basin, a municipality, a region, or a nation [1]. Various LEWSs, operating at different spatial scales are currently operational worldwide [2,3,4].

Alert models for swallow rainfall-induced landslides at regional scale are typically based on rainfall thresholds expressed in terms of cumulative rainfall or average intensity with respect to the duration of the rainfall event, completely neglecting the antecedent conditions [5]. It means that for each expected precipitation event of a certain intensity, regardless of the state of the soil, the system releases a certain level of alert, increasing the probability of false alarms. In this regard, recent studies have shown that the reliability of an alert model could be increased by including other variables, such as water content [6]. However, testing numerous atmospheric variables using traditional techniques can be quite complex and costly.

Machine learning (ML) techniques, are becoming popular to model complex problems for landslide analyses [7], as they are starting to demonstrate good predictive performance compared to conventional methods. Upon the introduction of ML methods to the landslide community, many studies have been carried out to explore the usefulness of ML in landslide research and to look at some classic landslide problems from an ML point of view. The main areas of investigation include landslide detection and mapping [8,9,10], landslide spatial analyses [11,12,13], temporal forecasting of landslides [14, 15].

With reference to temporal forecasting of rainfall-induced landslides, ML methods are recently being explored to determine rainfall thresholds, that are the rainfall conditions that, when reached or exceeded, are likely to trigger a landslide [16,17,18].

In [19], the rainfall indices of 60-min cumulative rainfall have been used and a soil–water index has been calculated to set up a critical line for a new early-warning system, employing a Radial Basis Function Network.

ML methods have also been employed to explore the relationship between amount of precipitation and groundwater level, with reference to conditions of instability, especially for deep-seated landslides. In particular, in [20], Artificial Neural Networks (ANN) and Support Vector Machines (SVM) have been used to build two nonlinear time-series models able to predict groundwater level fluctuations based on data for the groundwater level, precipitation, and the tide level. In [21], Random Forest (RF) has been adopted to build a predictive model of the fluctuation of the groundwater level for the Kostanjek landslide. In [22], SVM, combined with chaos theory, has been adopted to predict the daily groundwater levels of the Huayuan landslide and the weekly, monthly groundwater levels in Baijiabao in the TGRA of China. Finally, in [23], two ML methods based on the combination of genetic algorithms and an ANN and a SVM, respectively, have been studied to predict the ground water level fluctuation of the Duxiantou landslide placed in Zhejiang Province, China.

In this context, more extensively and deeply investigated in [24], this preliminary study proposes the use of ML techniques to select the most relevant variables for the triggering of rainfall-induced landslides at regional scale. The ML method used for predicting the occurrence of landslides is the Likelihood-Fuzzy Analysis (LFA) [25], which , differently from other state-of-the-art ML methods, focuses on two objectives, i.e., the good performances and the production of explainable models. In particular, it is mainly based on the hybridization of multiple approaches (feature selection, fuzzy logic, statistics, and rule-based models) to produce predictive models able, at the same time, to present good performances, in terms of both recall and precision, and explainable results, in terms of linguistically-ranged input variables and explicit correlations detected between them and the final outcomes.

The predictive models were tested in one of the alert zones defined by civil protection for the management of geo-hydrological risk in Campania region, Italy. Two data sources were used in the analysis. The atmospheric variables are derived from the ERA5-Land atmospheric reanalysis [26]. The data on landslide events are retrieved from “FraneItalia”, a georeferenced catalog of landslides occurred in Italy, developed by consulting online sources from 2010 onwards [27]. The models were calibrated and validated in the Camp-3 alert zone over a period of time spanning from 2010 and 2019 [28] in order to define combinations of rainfall variables and soil water content for the prediction of the occurrence of landslides.

This paper is organized as follows. Section 2 explains the theoretical concepts of the Likelihood-Fuzzy Analysis technique used herein. Section 3 describes the datasets and the experimental procedure. Section 4 evaluates the performance of the proposed approach in the case study, also in comparison with other state-of-the-art ML methods. Finally, Sect. 5 summarizes the conclusions carried out in this work.

2 Background

2.1 Likelihood fuzzy analysis

LFA is a method proposed in [29, 25], aimed to mine, from a training supervised dataset made of different features/variables, one or more multivariate fuzzy models able to predict a class y, related to a new incoming feature vector \(x = \{x^{(1)},..., x^{(n)}\}\).

This method was chosen for this study, since it is able to generate models exhibiting the following characteristics: (i) good performances, with respect to state-of-the-art methods, and robustness to uncertain data; (ii) interpretability, describing features through terms of linguistic variables (low/medium/high) and correlating them with the outcomes with if-then rules, to show the dependence of predictions on features; (iii) confidence measure of each prediction, expressed as probability of the class of interest to each occurrence of input data.

A multivariate fuzzy model is made of two main parts, namely the fuzzy sets associated to the features of interest, and the rule base, and is used for classifying objects through the process of fuzzy inference.

In more detail, the range of each j-th feature X(j), is partitioned into \(M_j\) fuzzy sets, described by membership functions \(\mu _F^{(j)}\) with specific positions in the admissible range of values. The fuzzy sets pertaining to each feature represent the terms of the associated linguistic variable (e.g., low, medium, high).

The model is made of a combinatorial set of R rules, where the \(\sigma\)-th rule is of the following type and \(\sigma\) \(\in\) [1,...,R]:

$$\begin{aligned} if \; x^{(1)} \; is \; F^{(1)}_\sigma \; and \; ...\; and\; x^{(n)}\; is\; F^{(n)}_\sigma \; then\; P_\sigma (c_1)\; ....\; P_\sigma (c_K) \end{aligned}$$
(1)

where \(c_1,c_2 ... c_K\) are the different K output classes.

The inference process is performed as follows. Each data sample \(x=\{x^{(1)},...,x^{(n)}\}\) fires the \(\sigma\)-th rule with a strength:

$$\begin{aligned} FS_{\sigma }(x)=\prod _{j=1}^{n}\mu _{F^{(j)}_{\sigma }}(x^{(j)}) \end{aligned}$$
(2)

The implication of each consequence class is modelled as:

$$\begin{aligned} IMP_{\sigma }(x)= FS_{\sigma }(x)\cdot P_\sigma (c_k) \end{aligned}$$
(3)

Finally, different implications are aggregated as:

$$\begin{aligned} AGG(x)= \sum _{\sigma =1}^{R}IMP_{\sigma }(x) \end{aligned}$$
(4)

and aggregations of all the classes are normalized. In general, weights are associated to rules, which implies to perform a weighted sum in (4), which are omitted for single-feature models, and for multivariate models if they do not improve classification significantly.

In case of two classes \(c_1\) and \(c_2\) as output, (4) gives a number in [0,1] that approximates the probability of \(c_1\) class. Once a threshold T is chosen, the final inference result is:

$$\begin{aligned} y = \bigg \{ \begin{array}{rl} c_1 &{} if \,AGG(x) > T\\ c_2 &{} otherwise \\ \end{array} \end{aligned}$$
(5)

LFA aims to determine a fuzzy model by optimizing a chosen performance measure on a given dataset, in particular through optimization of the fuzzy sets representing the linguistic terms of each variable, and optimization of the number of terms for each variable, of the set of variables making up the model, and fuzzy rules. The fundamental passages of LFA are as follows.

Firstly, the likelihood functions are calculated, which describe the posterior probabilities of classes \(P(c_k\mid x^{(j)})\) as functions of each of the input features \(x^{(j)}\). Then, each of these functions is approximated with a linear combination of membership functions of fuzzy sets, which constitute an interpretable partition of the variable range. Finally, rule weights (if foreseen) and consequents of a complete multivariate rule base are calculated to get the fuzzy model. More details are given in [25, 29].

3 Materials and methods

3.1 Study area and data used

The study area is Camp-3, one of the eight alert zones defined by the Civil Protection for the management of hydro-meteorological risk in Campania (Italian DPGR 299/2005). This area, with an extension of approximately 1619 km2, includes 109 municipalities and the Lattari, Picentini and Partenio mountains (Fig. 1).

Fig. 1
figure 1

Map of the Camp-3 alert area depicting the 120 landslides considered in this study and the centroids of the ERA5-Land cells. The insets show the position of the study area in Campania and Italy

The orographic conditions and the proximity of the sea favor the formation of convective storms [30, 31]. Moreover, the presence of pyroclastic deposits of volcanic origin on carbonate substrates makes these areas highly susceptible to the triggering of fast-moving landslides, such as shallow landslides, debris flows, debris avalanches, and hyperconcentrated flows [32]. Some of the most catastrophic landslides in Europe were recorded in the area, including the tragic events that occurred on the Pizzo d’Alvano massif between 4 and 5 May 1998 when about 2 million m3 of material fell down, causing at least 160 victims [33].

The information on landslides occurred in the study area was retrieved from FraneItalia, a georeferenced catalog of recent Italian landslides developed by consulting online sources from 2010 onwards [27]. Landslides are classified considering two numerous categories: single landslides (SLE), for records that report a single landslide; areal landslides (ALE), for records that refer to multiple landslides caused by a single trigger in the same Weather Alert Zone. In Camp-3, 120 rainfall-induced landslide events (72 SLE and 48 ALE) were recorded from 2010 to 2019, most of which (96 out of 120) occurred between October and March.

The rainfall and soil water content data are derived from the ERA5-Land atmospheric reanalysis [34], developed by the European Center for Medium-Range Weather Forecasts (ECMWF). Atmospheric reanalysis provides a consistent and complete picture of the atmosphere by combining observational data from satellites and ground sensors with physically-based meteorological models. ERA5-Land provides about 50 atmospheric variables available at a spatial resolution of 9 km and an hourly temporal resolution. Because of the importance of soil processes and an adequate parameterization of the processes, ERA5-Land represents, strictly speaking, a “replaying” of the soil component alone, while it uses statistical interpolations for the atmospheric variables of the forcings produced by the ERA5 atmospheric reanalysis (with spatial resolution of about 31 km).

3.2 Methodology

This study moves from the assumption that at this scale, i.e. considering the entire study area of almost 2000 km2 as a whole, rainfall-induced landslides can be correlated to a combination of measures linked to two factors: (i) a predisposing condition represented by the water content in the surface layers of the soil and (ii) a trigger condition represented by the rainfall variables [6].

The hourly data of the ERA5-Land dataset were pre-processed in order to obtain 13 input variable features, calculated with a daily temporal discretization consistent with the information contained in the catalog of landslides used as dependent variable outcome, that is to say:

  • daily maximums of the geographical averages of rainfall values, cumulated over time intervals of 1 h, 3 h, 6 h, 9 h, 12 h, 18 h, 24 h, 36 h, 48 h and 72 h (from F1 to F10);

  • daily maximum of the geographic standard deviation of the hourly rainfall values (F11);

  • daily maximum of the geographical average of the hourly values of the soil water content (F12);

  • daily maximum of the geographical standard deviation of the hourly values of the soil water content (F13).

To find a model that associates an outcome (landslide/no landslide, or landslide probability) to the known data of rainfall and soil water content, LFA method described in Sect. 2 was applied. The Mathematica 8Footnote 1 software was employed for the implementation.

The performance measure chosen in this study for the optimization of fuzzy sets, number of terms for each variable, number of variables to be used and final fuzzy rules, is the Squared Classification Error (SCE):

$$\begin{aligned} SCE = \frac{1}{KN} \sum \limits _{k=1}^{K} \sum \limits _{i=1}^{N} (P(k)- \delta ^k_i)^2 \end{aligned}$$
(6)

where, P(k) is the probability of the k-th class calculated by the model, and \(\delta ^k_i\) is 1 if the i-th sample is associated with the k-th class, otherwise 0.

The SCE was also used for the choice of the model, giving precedence to performance rather than to interpretability, largely guaranteed in any case by construction, as shown in the example of Fig. 2 and in the following results.

Fig. 2
figure 2

Examples of fuzzy logic applied by defining two (a) and three (b) linguistic terms for a generic variable (feature) of the ML model

4 Results

The 13 independent variables defined by reprocessing hourly precipitation and soil water content were correlated with the positive class of 120 days with landslides occurred in Camp-3 and the negative class of the remaining days from 2010 to 2019, through univariate models.

Figure 3 shows the values of the objective function (SCE, to be minimized) for each variable. In spite of a rather limited range of error variation, a monotonous decreasing trend emerges for precipitation intervals between 1 h and 18 h (duration characterized by the minimum error), with a slight increase up to 72 h and values significantly higher for precipitation standard deviation and soil water content.

Fig. 3
figure 3

Trend of the objective function for models defined using, individually, the 13 variables calculated with daily time discretization

Among the monovariate models, the one with the best predictive capability is shown in Fig. 4.

Fig. 4
figure 4

Probability of landslide and non-landslide associated with the model based on cumulative precipitation at 18 h

By dividing the 18 h cumulative rainfall into three fuzzy sets (low/medium/high), it can be seen that the maximum probability of landslide (37%) is obtained for high values (greater than about 40 mm). It should be remembered that this value refers to an averaged precipitation over the entire study area. The graph also shows two overlying ranges: an intermediate range in which the probability of a landslide is around 23%, and a range of low values of the precipitation in which the probability of a landslide is minimal (1%).

In addition, a multivariate model combining the 18 h cumulative rainfall and the standard deviation of the soil water content was developed (Fig. 5). In particular, the model combining the two variables associates a null probability of landslides with low values of both independent variables, and allows to identify ranges in which the probability of landslides rises to 46%. Finally, the results allow to highlight the additional contribution of the soil water content.

Fig. 5
figure 5

Probability of landslide and non-landslide associated with the model that combines the maximum of the 18 h cumulated rainfall and the standard deviation of the water content

4.1 Comparison with other ML methods

There is no consensus on an “optimal” ML method for landslide studies, even when looking at the results of the most recent comparative studies in landslide detection or spatial and temporal forecasting [24]. Therefore, even if the objective of this work is not to demonstrate that the LFA method is the best possible for this case study, to assess its effectiveness in terms of performance and interpretability, other established ML methods existing in the literature were tested.

In particular, all the methods chosen are available in the Waikato Environment for Knowledge Analysis (WEKA 3.8) [35], and can be summarized in terms of the category they belong to and their configuration parameters, as reported as follows:

  • Logical/symbolic classification

    • RIPPER rule-based classifier [36] (with batch size 100, 3 folds for pruning, minimum total weight of the instances in a rule 2.0, 2 optimization runs);

    • C4.5 decision tree [37] (with batch size 100, 3 folds for pruning, confidence factor 0.25, minimum 2 instances per leaf, subtree raising and MDL correction);

  • Statistical learning

    • Naïve Bayes (NB) [38] (with batch size 100);

    • Bayesian Network (BN) [39] (with batch size 100, Simple Estimator algorithm with alpha = 0.5 for finding the conditional probability tables, K2 learning algorithm with max 1 parent and Bayes score type);

  • Instance-based learning

    • K-Nearest Neighbours (K-NN) [40] (with K equal to 1, batch size 100, no distance weighting, no limit to the number of training instances, brute force search algorithm for nearest neighbour search, Euclidean distance);

  • Function-based classification

    • Logistic Regression (LR) [41] (with batch size 100, ridge value in the log-likelihood 10–8, unlimited iterations, BFGS updates);

    • Support Vector Machine (SVM) [42] (with batch size 100, complexity parameter c=1.0, epsilon for round-off error 10–12, tolerance parameter 0.001, multinomial logistic regression model with a ridge estimator as calibration method with batch size 100, ridge value in the log-likelihood 10-8, unlimited iterations, BFGS updates, and polynomial kernel with cache number 250,007 and exponent 1.0);

  • Perceptron-based learning

    • Multi-layer perception network (MLP) [43] (with batch size 100, one hidden layer with number of neurons equal to (attributes + classes)/2, learning rate 0.3, momentum 0.2, no decay of learning rate, 500 epoch;

  • Ensemble learning

    • Random Forest(RF) [44] (with batch size 100, and number of trees in the random forest 100).

The performance of these ML models was calculated through a 10 fold cross-validation, in terms of F1 score, Precision and Recall metrics.

As far as the performance is analyzed, the results reported in Table 1 reveal that the models characterized by the best performances are those obtained by LFA and the NB method, as both reach an F1 of 0.27, Moreover, the models obtained with the LFA method were superior in terms of F1 score with respect to those obtained with the other state-of-the-art (interpretable and not) ML methods tested in this work. Therefore, this confirms both the validity of the results achieved, and the model’s applicability to support the prediction of the occurrence of rainfall-induced landslides.

Table 1 Performance indicators associated with the tested machine learning models (the models obtained through the selection of variables are indicated with an asterisk)

Moreover, the interpretability of models obtained by LFA is not comparable with the other tested ML methods. Indeed, LFA allows obtaining models able to give a clear explanation of the inference process, based on a rule base built on the top of interpretable linguistic terms. For example, with regards to the two-dimensional model above described, functions represented in Fig. 5 allow to clearly distinguish linguistic labels of both the maximum of 18h cumulated rainfall and the standard deviation of the water content. Given these labels, the 2-dimensional rule base clearly state its logical consequence in terms of probability of rainfall-induced landslides. Indeed, it is possible to distinguish the case when “the maximum of 18h cumulated rainfall is low”, which implies a probability of landslides equal to zero (or almost zero), and the remaining cases when “the maximum of 18 h cumulated rainfall is medium or high” and, depending on when “the standard deviation of the water content is low or high”, the probability of landslides increases until it reaches the value of 46%.

Furthermore, the use of fuzzy logic behind the LFA method naturally offers the possibility of creating robust models to uncertain data, i.e. capable of generating small changes to the output in response to small changes to the input features. In fact, the LFA method (as well as LR and statistical methods) provides an output probability that continuously varies in the feature space. Therefore, it is robust with respect to the uncertainty of both the input feature and the output.

Finally, by means of these output probabilities, the models found by LFA allow associating a confidence to the prediction of new cases, and reflects, more than sensitivity or specificity, the real uncertainty associated with each potential rainfall-induced event. It is worth noticing that some other ML methods (like LR and statistical methods) also produce confidence grades linked to responses; however, to the best of our knowledge, the LFA method is the only one that, in addition of giving a fully interpretable model, approximates the probabilities associated to each outcome.

With respect to all the other ML methods mentioned in Table 1, it is worth noting that none of them simultaneously presents all the described characteristics. Indeed: (i) MLP, instance-based methods and SVM generate models that are not interpretable at all; (ii) RIPPER rule based classifier is not able to assess a classification confidence in terms of outcome probabilities; (iii) C4.5 decision trees and RF are not robust to handle uncertainty of data, since they define sharp boundaries in the feature space; (iv) statistical methods (NB and BN), and LR generate models with a level of interpretability not comparable to LFA, since they are not based on logical rules and linguistic terms able to more clearly highlight the relationships among the input features and the final prediction.

Summarizing, the LFA method has shown to be a valid support for identifying the most relevant variables to trigger shallow rainfall-induced landslides and for clearly representing their relations with the predicted outcome, thanks to the model interpretability. Moreover, the good performance of models found in the present work and the possibility of producing robust and confidence-based results confirm that the LFA method can be proficiently applied, in place of more classical ML approaches, for building rainfall-induced landslide alert models.

5 Conclusions

The use of machine learning techniques for the definition of models that combine, at regional scale, variables of rainfall and soil water content for the prediction of rainfall-induced landslides was tested in this preliminary study, in the Camp-3 alert zone in a period between 2010 and 2019.

The developed models made it possible to identify some variables significantly correlated with the considered landslides and made it possible to calculate the probability of occurrence of the events rather than simple dichotomous relationships.

Some possible future developments of the study could be: (i) comparisons with other alert models (e.g. model used by the Campania region, empirical rainfall thresholds used in literature); (ii) development of analyses taking into consideration only the most numerous areal events; (iii) calibration and validation of models that use other potentially relevant variables.