Definition of 3D rainfall thresholds to increase operative landslide early warning system performances

Intensity–duration rainfall thresholds are commonly used in regional-scale landslide warning systems. In this manuscript, 3D thresholds are defined also considering the mean rainfall amount fallen in each alert zone (MeAR, mean areal rainfall) in Emilia Romagna region (Northern Italy). In the proposed 3D approach, thresholds are represented by a plane instead of a line, and the third dimension allows to indirectly account for the influence of complex rainfall patterns. MeAR values are calculated according to different time periods ranging from 7 to 30 days, and all threshold parameters are calibrated independently for the 8 alert zones in which the region is divided. The approach was validated and compared with classical intensity–duration thresholds, finding that the 3D threshold may be used to get better performances, especially in terms of a consistent reduction of false alarms:− 20 to − 86%, depending on the alert zone and the selected MeAR duration. These results open new encouraging perspectives for the development of the regional warning system that is operated in the study area.


Introduction
Landslides are one of the most widespread natural hazards in the word, responsible every year for casualties and huge economic losses (Froude and Petley, 2018). As a consequence, landslides are intensely studied natural phenomena: in the scientific literature, several papers addressing landslide hazard or risk management and prevention can be found (Aleotti and Chowdhury, 1999;Guzzetti et al., 1999;Fell et al. 2005;Hungr et al., 2005;Chae et al., 2017;Maes et al., 2017, Devoli et al., 2018Hungr, 2018, Rosi et al. 2018Salvatici et al. 2018;Dikshit et al., 2020). Several global-scale works highlighted that China, Japan, India, and Italy are among the countries most exposed to landslide risk (Petley, 2012;Dowling and Santi, 2014;Haque et al., 2016;Kirschbaum et al., 2015). In Italy, in particular, landslides along with floods are a very recurring phenomenon: over 15,000 landslide events news can be found in online newspapers from 2011 according to results of the semantic engine developed by Battistini et al. (2013); from 2000 to 2018, landslides have been responsible for over 250 fatalities and for about 5.6 billion € of damages . As a consequence, several attempts of reducing their impact have been made, both by the public authorities and the scientific community, sometimes also working in strict collaboration (Melillo et al., 2016;Segoni et al. 2018a;Tiranti et al., 2019). For instance, in Italy, the establishment of landslide warning systems for all the 20 regions is compulsory by law, and today, all the regional public authorities are expending efforts to define reliable and functional early warning system.
Traditional studies over the behavior of single landslides, as geotechnical characterization and modeling (e.g., Agostini et al., 2014), allow us to enhance knowledge over them and to plan the best strategies for their stabilization or, in general, for risk reduction but are less applicable for large numbers of landslides involving widespread areas (hundreds or thousands of km 2 ). In large areas, the strategies most commonly adopted include the development of hazard or risk maps to be used for risk management and territorial planning (Fell et al., 2008;Hungr, 2018) and the definition of early warning systems (EWS) based on deterministic or statistical approaches (Piciullo et al., 2018;Guzzetti et al., 2019).
The development of EWS has highly increased in recent years as demonstrated by the increasing number of scientific papers published on this topic (a review can be found in Piciullo et al., 2018or in Segoni et al., 2018b; the majority of these papers is based on traditional statistical approaches, like the widespread intensity-duration rainfall thresholds model (Caine, 1980;Guzzetti et al., 2008)-since they can be defined with little input data (usually only rainfall data and landslide dates) and can be easily understood and implemented into automated systems. Segoni et al. (2018b) pointed out that before the implementation into operational EWS, rainfall thresholds should be carefully validated and evaluated, but this good practice is not fully consolidated. Indeed, some studies conclude that a complete validation shows that the main drawback of the EWS at hand is that a good hit rate is usually achieved at the cost of a high number of false positives (false alarms) (Rosi et al., 2016(Rosi et al., , 2019Abraham et al., 2020;Gariano et al., 2020). Even if the great success of rainfall threshold lies in their simplicity, some authors tried to get a more robust connection between rainfall and landslide triggers (and, possibly, a reduction of false positives) by including some hydrological perspectives into the modeling and interpretation of the thresholds (Bogaard and Greco, 2018). One of the earliest and simplest approaches is the definition of weighted indexes accounting for the effect of evapotranspiration on the antecedent precipitations (Glade et al., 2000;Ponziani et al., 2012;Chen et al., 2017). However, antecedent rainfall and rainfall intensity have been usually considered mutually exclusive, and one of the two approaches is usually selected depending of the characteristics of the test site and the typology of the studied landslides. Other researchers developed complex decisional algorithms based on long-term rainfall anomalies (Martelloni et al., 2012) and observed that in some particular settings (e.g., Emilia Romagna, Italy, and Indian Himalaya) they could outperform simpler approaches based on rainfall intensity, rainfall duration, or event rainfall Abraham et al., 2020). Another series of works, starting from the rationale that landslides are triggered by the increase of water pore pressure and not by rainfall itself, tried to include soil moisture conditions in the threshold modeling (Terlien, 1998;Ponziani et al., 2012;Valenzuela et al., 2018;Wicki et al., 2020). Several researches demonstrated the promising potential of the soil moisture approach, which in general allows an improvement of the results in comparison with the classical intensity duration thresholds, but at the same time highlighted also some limitations. Indeed, in situ soil moisture measures can be difficult to retrieve over large areas and would require an extensive and costly network of instruments (Wicki et al., 2020). To bypass this problem in regional-scale studies, some authors resorted to estimated soil moisture values (Segoni et al., 2018c), which have the drawback of being inherently affected by approximations, or integrated remotely sensed soil moisture data (Ponziani et al., 2012;Thomas et al., 2019;Zhao et al., 2019;Zhuo et al., 2019), which provide indirect estimations that usually have the drawback of a lower time resolution with respect to rainfall measures. Furthermore, in an operational use, a model must be as simple as possible to be understood from the stakeholders, decision-makers, and operators, and it is important that all the required dynamic input data can be easily retrieved and implemented in the EWS. Therefore, the approach of using rainfall variables retrieved from automated rain gauge networks is still a standard that proved reliability and effectiveness all over the world (Segoni et al., 2018b).
In this framework, this work has the objective to balance complexity and simplicity in a novel approach that starts from an intensity-duration threshold model but expand it further by adding other parameters accounting for more complex hydrologic conditions but still using only rainfall data. This objective was achieved by defining a set of 3D rainfall thresholds, in which the landslide triggering condition is defined as a portion of a three dimensional space, delimited by two independent surfaces, where the three axes are represented by rainfall intensity, rainfall duration, and mean accumulated rainfall over a given area (MeARmean areal rainfall).
This improved system has been tested to serve in the regional landslide early warning system of Emilia Romagna region (Italy), and was found that it allowed for a dramatic reduction of false positive alarms, hence leading to a better performing EWS.

Study area
The study area is Emilia Romagna, a 22,446-km 2 wide region located in Northern Italy. The northern and eastern parts of the region correspond to the alluvial deposits of Po River (the main Italian river) and are flat. The southern and western parts are occupied by the hills and mountains of the Apennines, which is a fold and thrust mountain belt that in the study area is mainly composed of turbidites (sandstone and calcarenites) and pelithic layers (Vai and Martini, 2001). Some reliefs are also made up of argillaceous formations, which were extensively affected by large intermittent landslides during the Holocene.
Landslides take place mainly in the hilly and mountainous part of the region (Fig. 1), where landslide density is about 0.12 landslide/km 2 . According to the Italian landslide Inventory (Trigila et al. 2010), Emilia Romagna is affected by landslides of different typologies: rotational-translational slides (45.9 %, affecting mainly flysch), slow earth flows (24.9 %, occurring in clayey lithologies), and complex movements (22.9%, typically rotational failures at the head progressively changing into translational movements throughout the body and toe). Rapid shallow landslides represent 0.5% of the total and were less frequent in the past, but their triggering became a recurring phenomenon in recent years (Montrasio et al., 2012).
Seasonal distribution of landslides is affected by the Mediterranean climate of the region, which is characterized by dry and warm summers (approximately from May to October) and cold and wet winters (approximately from November to April). Since the landslides of the region are rainfall induced, they mainly occur in autumn and winter, whereas a lower number is reported in summer, during short and intense thunderstorms.
For civil protection and early warning purposes, the region is divided into eight areas, called alert zones (AZ), which are defined on the basis of the morphology of the territory, the principal natural hazards, and the administrative subdivisions of the region (e.g., to ensure a more straightforward response, municipalities cannot be split in two or more AZs. The relief affects the spatial distribution of both landslides and rainfall; the mean annual precipitation (MAP, calculated in the period 2005-2015) has the highest value (about 2000 mm/year) in the southwestern part of the region, where the main relief is present; and it progressively decreases in northeastern direction, toward the plain area, where its value is close to 600 mm/year ( Fig.  2).

Landslide and rainfall data
To develop this work, landslide and rainfall data from 2005 to May 2019 have been used. These datasets have been split into two different datasets: calibration dataset, using data from 2005 to 2015, and validation dataset, using data from 2016 to 2019.
Landslide data have been provided by the Geological, Seismic and Soil Service (SGSS) of Emilia Romagna, which supports data about landslides triggered in the whole region from the nineteenth century to the present. Hourly rainfall data have been provided by the Regional Agency for Environmental Protection of Emilia Romagna (ARPAE), which handles the regional rain gauge network, comprising 323 automatic rain gauges.
Calibration dataset was made up of 1108 landslides selected from those archived by the SGSS, while validation dataset comprised 71 landslide events (one event can contain 1 or more landslides), collected from the database of SGSS, but also provided by the Civil protection Agency of Emilia Romagna and retrieved from the data mining tools described in Battistini et al. (2013Battistini et al. ( , 2017.
To perform both calibration and validation, not all available landslide data were used but only those with a good accuracy in temporal and spatial location; i.e., only landslides for which triggering dates were certain or at least within a time span of 2-3 days; similarly, only landslides whit a verified triggering location or with a certainty of few kilometers have been considered.
This selection was necessary to properly couple each landslide with a "reference" rain gauge, as described in the following sections and to properly identify the rainfall event responsible for the triggering of the landslide.

Methodology
Since the territory of the region is quite heterogeneous, with the relief extending from east to west, only in the southern part of the region and the northern part mainly flat, the whole area was divided in 8 more homogeneous zones, and a different threshold has been defined for each one. These zones are called alert zones and are defined by public authorities based on their morphology,

Original paper
Landslides 18 & (2021) the main natural hazard affecting them, and the administrative boundaries.

2D threshold
To initially define the rainfall thresholds, the procedure fully described in Segoni et al. (2014 a, b) was used. This procedure allowed the definition of thresholds according to the popular power law firstly proposed by Caine (1980): where I and D are rainfall intensity (mm/h) and duration (hours), respectively, while α and β are empirical parameters defined from the data distribution. The adopted procedure is based on a semi-automatic approach, which makes use of a software named MaCumBA (Segoni et al. 2014 a); this software requires as input data the date and location of landslide events and the hourly rainfall recorded by the rain gauges from 2 months before to 2 days after each landslide report. Furthermore, two internal parameters have to be set: the definition of the search radius around each landslide (which is used by the software to select the more representative rainfall) and the definition of the NRG (no rain gap), i.e., the number of hours without rain needed to consider two rainfall events as separated. The software then pairs landslides and rainfalls to define the triggering intensity and duration for each landslide and use these data to define statistical or probabilistic ID rainfall thresholds. Since the results are sensitive to the search radius and the NRG, several and different thresholds are defined, a calibration process verifies their performances out, and the threshold with the highest skill scores is    ; these values are then used to define three statistical indicators, the positive predictive power (PPP), the negative predictive power (NPP), and the wfficiency (E), which allowed to verify the quality of the thresholds .
Once the best threshold of each AZ was defined, it was validated with an independent dataset as described in the previous section.
This procedure allowed to define the classical two-dimensional I-D rainfall thresholds.

3D threshold
Intensity and duration are useful to characterize the triggering rainfall occurring shortly before a landslide, but a growing number of researches highlighted that antecedent rainfall may play a relevant role in predisposing (or even triggering) slope instabilities, and antecedent rainfall indexes have been used as an indicator of the hydrologic conditions of the hillslope to define triggering thresholds (Glade et al., 2000;Lee et al., 2014). In order to account for the effect of the antecedent rainfall and to achieve the third dimension, another rainfall parameter was introduced: the mean areal rainfall (MeAR), which is defined as the mean accumulated rainfall recorded by all the rain gauges of a given AZ in a given time period (namely, we tested 7, 10, 15, and 30 days before a given landslide event). To use MeAR as an additional threshold parameter, MeAR was defined for every rainfall event overcoming the I-D threshold during the calibration period; then, all the MeAR values were plotted as a z-axis with the I-D threshold, to verify if differences between the MeAR of TP and FP could be identified.
A sensitivity analysis was carried out as well, to verify how TP and FP reduction varies with the MeAR values.
First analyses showed that the MeAR values related to the FP were significantly lower than those related to TP (Fig. 3), and this encouraged to identify a MeAR threshold (Fig. 4), which allowed a substantial reduction of the number of FP, with a small reduction of TP (i.e., some new FN are created).
To identify the best MeAR threshold ( Fig. 5 and Fig. 6), an iterative procedure was used to get the desired balancing level between the maximization of FP reduction and the minimization of TP reduction, according to operational needs of the end users, so as the performances of an EWS based on this approach can be improved. In the application test, the best threshold was selected as the one which allows the lowest reduction of TP, with the highest reduction of FP.
To quantify and evaluate the improvement obtained introducing the third dimension with the MeAR parameter, the I-D-MeAR thresholds of the AZs have been validated with an independent dataset.

2D threshold identification
The procedure used to define the 2D thresholds allowed to identify a rainfall threshold for 6 AZs, since 2 AZs (D and F) are mainly plain areas and they are not exposed to a significant landslide hazard (Fig. 1).
In the following table (Table 1), the equation and the performances of each threshold are reported, along with the results of the calibration process.
As shown in the previous table, the general efficiency of the thresholds is usually high (from 84 to 96%), but the number of FP is generally higher than the number of TP, and the percentage of correct alarms (expressed as PPP) ranges from 40 to 80%; these scores cannot be considered satisfactory, since these threshold have to be used in on operative EWS and a TP rate of 40% is not tolerable. To increase the performances of the system, there were 2 possibilities: (i) increase the number of TP and (ii) reduce the number of FP. The number of TP was slightly improvable, since the number of FN (missed alarms) was generally low in respect of the TP and FP, so the work was focused on the FP reduction by means of the introduction of a third parameter (MeAR).

3D rainfall threshold definition
The iterative procedure presented in the 3D threshold definition methodology allowed to define a MeAR threshold for each time interval and for each alert zone. The results are reported in the following table.
This approach allowed a substantial reduction of FPs, ranging from 26 (in the AZ C, where the number of FP was already low) up to 69% (in AZ H).
As can be seen from Table 2 and Fig. 7, generally the best results in terms of FP reduction can be obtained considering a time interval of 10 days but with drawback of a reduction of TP (thus an increase of FN). Conversely, the lowest TP reduction is obtained with the 7-day rainfall; more in general, the 15-and the 30-day

(6%)
The absolute number of FP and TP encountered in each AZ is reported in Table 1 Landslides 18 & (2021) 1051 mean rainfall give more balanced results and seems to be the most suitable for an operative purpose. In Fig. 8, the sensitivity analysis of the proposed approach is presented: this analysis has been performed for each AZ and for each time interval considered to define the MeAR thresholds; it shows how the reduction of TP and FP vary changing the MeAR threshold values.
It resulted that in some AZs (A, B, E), the FP reduction (solid lines in the figure) increases rapidly even with low values of MeAR, while in AZs G and H, the reduction is more gradual, and in AZ C it is quite stable at the beginning and then increases rapidly.
It is also clear that the FP reduction is higher for the 7-, 10-, and 15-day intervals, whereas the 30-day interval shows a generally lower reduction of FP. To the other hand, the 7-, 10-, and 15-day intervals lead to a general higher reduction of TP, whereas for 30 days, this reduction is sensibly lower.

Validation of the results
A good practice to verify the quality of the results is the validation against an independent dataset of landslide and rainfall data. Consequently, the results obtained as described above have been validated using the landslides reported from 2016 to May 2019.
The validation has been done by the comparison among the thresholds and the hourly rain data of the validation periods; then, the results have been checked and compared with landslide triggering dates, to identify TP, FP, TN, and FN events.
The validation work was set in two phases: at first the I-D thresholds are validated (Table 3), to verify if the results of the validation were consistent with those of the calibration phase; secondly, the MeAR parameter is added, and the I-D-MeAR thresholds are validated as well (Table 4), to verify if the reduction of the FP was confirmed for all the AZs.
The first step of the validation confirmed that the thresholds defined with the classical I-D approach are able to identify almost all the landslide events, with few or without FN (missed alarms), but it also confirmed that the system has some issues with the high number of FP (false alarms), which is usually higher than TP. The second step also confirmed the results obtained during the 3D threshold definition, with a consistent reduction of the FP, up to 86% in AZ H, with a low reduction of TP. The most contained reduction of FPs was obtained for AZ C, where the number of FP was already low, if compared with the other AZ, as it resulted also in the calibration phase.
In Fig. 9, a graphical comparison between the FP reduction obtained with the calibration and validation datasets is reported; from this comparison, it resulted that the FP reduction is comparable with the two datasets, except for AZ C, where the result of the validation procedure showed worse result respect to the validation. In any case, since the validation dataset is quite small, more data needs to be collected for a better comparison.

Discussion
The proposed procedure can be considered a step forward in the definition of rainfall thresholds representing the conditions that can lead to landslide triggering. With respect to the conventional I-D approach, the domain associated to landslide condition is reduced: instead of a semi-plane defined by a line in the 2D plot (D on the abscissa and I on the ordinate), now it is represented by a portion of space delimited by a plane defined by the MeAR threshold (in the zaxis) and by another plane defined by the I-D threshold (Fig. 6).
From an operational point of view, an important point of strength of this procedure is that the reduction of the space associated to landslide conditions was obtained keeping the number of missed alarms within acceptable limits. Therefore, the false alarm rate can be considerably reduced (observed reduction ranges from − 20 to − 80%, depending on the alert zone and on the MeAR value selected as a threshold). It should be noted that the sharp reduction of FPs is very helpful for the reliability of the warning system, as a high number of false alarms (FPs) could rapidly lead to a generalized and hard to recover distrust in the system, both in citizens and authorities.
The good results obtained by the use of the MeAR parameters indicate that the analyzed landslides (mainly surficial landslide) are associated not only to short and intense events but that these short events are associated to significant rainfall amounts cumulated in the antecedent days; i.e., two rainfall events with the same intensity and duration can trigger or not a landslide if they occur after a rainy or a dry period, respectively. This is an interesting outcome of this work, since many works in the traditional literature (Crosta, 1998;Bonnard and Noverraz 2001;Martelloni et al., 2012) associate surficial landslide to short and intense rain events, whereas long rainfalls are usually considered responsible mainly for deep landslides. On the other side, the results are in line with another series of more recent works that try to propose a more reasoned and complex parameterization of the thresholds models by using weighted antecedent precipitation indexes (Glade et al., 2000;Chen et al., 2017) or aiming at reducing the FP rate using soil moisture data (Segoni et al., 2018c;Zhao et al., 2019). The outcomes of the present studies strengthen the hypothesis that also shallow landslides may have a complex hydrological response to rainfall: the introduction of a third variable allows to better reproduce the cause-effect relationship always remaining inside the field of a simple empirical modeling, which compared with more complex approaches, has the advantage of using widely and easily available data (only rainfall measures) that can be collected, integrated in a EWS, and interpreted in a very straightforward way.
Concerning the MeAR values to be used as a threshold, the main objective of this work is to test the feasibility of the approach. In this work, the values were chosen empirically trying to optimize the response of the warning model to the objectives of the end users. It should be stressed that in other applications, different approaches could be used to select the MeAR threshold value. For instance, instead of finding a balance among missed alarms and false alarms, one could decide to lower the MeAR threshold until no missed alarm is obtained (of course, at the cost of producing a very lower reduction of false alarms). Ultimately, an ad-hoc trade-off between missed alarms and false alarms should be identified according to the policies of the local decision makers. Furthermore, the time intervals used to define the MeAR have been defined according the expert judgment of Emilia Romagna Civil Protection Agency, so, in different study areas, other time periods may need to be explored to get good results. The proposed procedure is therefore explained in general terms, relying  to the specific needs of further applicants the decision and definition of the optimal threshold level of MeAR and the time intervals to be considered.
Looking closer at the distribution of the MeAR threshold values, they do not seem relevantly influenced by the rainfall regime of the region (Fig. 10) or by the morphology (which often affects rain patterns): for instance, the values of the 30-day MeAR are very similar (ranging from 93 to 111 mm) in all AZs, except for AZ B, where this value is 64 mm and where the average rainfall values are lower than elsewhere. Even the MeAR values for the other time intervals do not show any particular distribution or correlation, since the highest MeAR for 7 and 15 days have been defined for AZ G, which is the most rainy area of the region (MAP ca. 1300 mm/year), but the lowest values for the same time interval have been defined for AZs E and B, which show a very different pluviometric behavior and morphology, as AZ E is mainly mountainous and hilly, with a MAP over 1000 mm/year, whereas AZ B is mainly plain with few hills and steep cliffs and a MAP of ca. 700 mm/years. A more accurate statistical analysis between the MeAR thresholds, the MAP and the mean elevation and slope angle of each alert zone, made by the Pearson correlation coefficient, showed that in general there is not a clear correlation between the considered parameters, except for the 30-day MeAR that shows values ranging from 0.61 (MAP) to 0. 72 for slope angle (Table 5).
Nevertheless, it should be noted that the MeAR threshold value may be influenced also by other complex and interplaying factors such as lithology, anthropic activity, and land use. Linking the MeAR threshold values to specific features of each AZ is difficult but, in accordance to the philosophy of "black box" approaches like empirical rainfall thresholds, it can be observed that the most effective time interval to calculate MeAR is not equal in all AZs (Table 2 and  Table 3). This suggests that in the future, further refinements of the procedure (supported by more data) could benefit from researching and setting different time intervals to calculate MeAR in different alert zones.  Furthermore, in this work, the validation dataset was quite limited, since in the analyzed period (2016 May 2019) few landslides have been reported, and a comparison between TP reductions in the two observed periods was not made, as it could be not completely significant, leading to the misunderstanding of the results; in the future, a larger validation dataset can help to better validate the results and contribute to improve (along with the calibration dataset) the thresholds leading to more meaningful results.

Conclusion
In this paper, the setting of 3D rainfall thresholds for landslide initiation has been presented; these thresholds have defined coupling classical intensity-duration rainfall thresholds (defined in a Cartesian plane) with a new rainfall parameter (considered in the z-axis), defined using the mean rainfall of each alert zone (MeAR), calculated considering time intervals ranging from 7 to 30 days. This approach was tested in the Emilia Romagna region (over 22,000 km 2 ), with the objective of improving the existing regional-scale landslide warning system, in particular for the reduction of false alarms committed by the traditional I-D model. After a specific calibration procedure, carried out separately for each one of the 8 alert zones in which the region is subdivided, the resulting 3D thresholds have been validated and compared, in terms of predictive effectiveness, with classical ID thresholds.
We concluded that a classical ID approach could be improved adding MeAR as a third rainfall parameter to define 3D thresholds. This improvement led, in our case of study, to better validation statistics, especially in terms of a consistent reduction of false alarms, thus opening promising perspectives for an implementation in the regional landslide warning systems currently operated in the region.

Acknowledgments
The presented work has been developed in collaboration with the Emilia Romagna Civil Protection Agency, which also funded this work. Authors would thank the staff of SGSS (Geological Service), of ARPAE (Regional Environmental Agency), and of the Multirisk Centre for providing data and useful feedbacks.
Authors' contributions A.R. conceived the methodology and calculated the thresholds, V.C. validated the ID thresholds, A.G. gave support to the analyses, A.R. and S. S wrote the paper, and A.M. and N. C supervised the work.

Funding Information
Open access funding provided by Università degli Studi di Firenze within the CRUI-CARE Agreement.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/ 4.0/. Original paper