Introduction

Landslides are one of the most frequent natural hazards in the world, causing every year casualties and massive economic damages (Froude and Petley 2018; Haque et al. 2019). In Italy, landslides are a very recurring geomorphological process (Herrera et al. 2018): Franceschini et al. (2022a, b) identified over 30,000 landslide events from 2011 to 2021 by mining Italian online newspapers; according to Bianchi and Salvati (2022) and Rossi et al. (2019), in Italy, landslides were responsible for 1071 fatalities from 1972 to 2021, and for about 5.6 billion € of damages from 2000 to 2018; moreover, according to Italian regulations and governmental-level inventories, about 20% of Italian territory is officially mapped as exposed to landslide hazard (Iadanza et al. 2021).

A cost-effective approach for landslide risk mitigation is the use of forecasting models for early warning purposes. Several methods exist in the literature and can be broadly divided into two main groups: (i) physically based and (ii) empirical/statistical models. Physically based models use complex mathematical relationships with the aim of faithfully reproducing the physical processes that triggers the slope instability (Montgomery and Dietrich 1994; Baum et al. 2008; Rossi et al. 2013; Medina et al. 2021; Reid et al. 2015; Bout et al. 2018). They are generally applied at the slope or catchment scale, because they require several hydrological and geotechnical input parameters that are difficult to spatialize over large areas due to their extreme heterogeneity (Tofani et al. 2017; Vannocci et al. 2022), thereby making such models mainly limited to prototypal applications (Alvioli and Baum 2018; Tofani et al. 2017; Canli et al. 2018; Salvatici et al. 2018; Schmaltz et al. 2019; Schilirò et al. 2021). Therefore, regional landslides early warning systems (LEWS) are typically based on rainfall thresholds (Piciullo et al. 2018; Gariano et al. 2020), defined as a rainfall condition beyond which slope instability occurs (Caine 1980; Guzzetti et al. 2008; Segoni et al. 2018a, b; Piciullo et al. 2018). Rainfall thresholds are typically derived using only two input data: rainfall records, used to characterize each triggering event, and an inventory of past landslide events, to be used for back analysis and for which the time of occurrence is known. These data are usually divided into two subsets, one for the threshold definition (calibration phase) and the other one for testing its predictive capabilities (validation phase). Although physically based approaches are more accurate, rainfall thresholds are fast and sufficiently accurate for regional-scale predictions, and can easily be understood and implemented for operational warning purposes.

However, at present, two important issues strongly limit the implementation of statistical rainfall thresholds for operational applications: (i) the high number of false alarms (FAs, or errors of commission: alerts issued because rainfall exceeds the threshold, but without the occurrence of landslides) and (ii) the lack of a robust validation phase (Piciullo et al. 2017, 2020).

Some studies show that for rainfall thresholds, a good hit rate, that is, the ratio between correct alarms (CAs: threshold exceedances with landslides reported) and missed alarms (MAs: no thresholds exceedances, but landslides reported), is generally achieved at the cost of a high number of FAs, restricting their potential use in an operational LEWS (Staley et al. 2013; Corsini and Mulas 2017; Abraham et al. 202020212022; Rosi et al. 201620192021; Gariano et al. 2020; Segoni et al. 2023). Indeed, considering a landslide risk scenario articulated on three criticality levels, namely low, moderate, and high (Segoni et al. 2018a, b; Piciullo et al. 2018; Segoni et al. 2022; Sala et al. 2021), an FA can be considered negligible for precautionary purposes only at the low criticality level. Repeated FAs at moderate or high criticality levels have a significant economic impact on the civil protection system, and the countermeasures adopted to contrast the critical situation, such as the recurrent closure of roads or evacuation of buildings, can cause disservices that are difficult to justify (Sala et al. 2021). Moreover, an excessive number of FA may cause a lack of confidence in the warning system itself by the population, which, by observing repeated FAs, will tend to consider potentially critical situations without due attention (Amato 2014).

For implementation in a LEWS, a comprehensive validation phase evaluated using an independent dataset is mandatory, as it allows ascertaining whether the predictive capabilities of a threshold can be considered valid in the future and not only for the calibration period. However, this procedure is rarely conducted over extensive datasets, due to the scarcity of data, which makes challenging the partitioning of the database into two robust subsets (Von Ruette et al. 2011; Martelloni et al. 2012; Leonarduzzi et al. 2017; Segoni et al. 2018a, b; Martinengo et al. 2023).

To overcome these issues, this paper defines a set of multiple rainfall thresholds specifically conceived for operational applications and including original approaches to decrease the number of FAs and to include a robust multi-source validation process.

First, MaCumBA (Massive Cumulate Brisk Analyzer) software, developed by Segoni et al. (2014a, b), was used to define a set of three intensity-duration (I-D) thresholds (corresponding to increasing criticality levels) for the five alert zones (AZs) in which the Liguria region (North-West Italy) is partitioned. In order to optimize the threshold system for operational warning applications, a landslide database collected by a data mining technique called SECaGN (Semantic Engine to Classify and Geotagging News) and developed by Battistini et al. (2013) was used in this study. This technique prioritizes landslide events with a significant impact on the society while neglecting non-impactful occurrences, resulting in a database focused on events of actual interest for civil protection.

Afterward, the number of FAs exceeding the moderate and high criticality level thresholds was limited by an innovative calibration method, to further minimize their impact on the civil protection system. Building on the methodology proposed by Rosi et al. (2021), a third rainfall parameter was considered to account for the antecedent rainfall and its spatial pervasiveness: OMAR (optimized mean areal rainfall) is calculated as the mean rainfall amount fallen in an AZ (averaging all available measurements) during a given time period, which specific duration is identified as the one that minimizes false alarms without negatively affecting the hit rate of the basic intensity-duration threshold system. This procedure allowed a massive filtering of FAs, improving the predictive performance of thresholds.

Finally, a robust validation procedure was carried out. A standard validation process was performed using a subsample of the landslide dataset (2010 to 2019 data used for calibration, 2020–2021 data used for validation). Afterward, the model results were compared with alternative source of information, including (i) the alerts issued by the warning system currently in use in the Liguria region, for the period 2015–2021, and (ii) a dataset of national-level emergencies (Gatto et al. 2023), for the period 2013–2021. While the first allowed evaluating if the model outputs are consistent with the alert levels that the civil protection system would have issued, the second allowed to focus the validation of the model with respect to high-severity landslide events.

Materials

Test site description

Liguria is located in the north-western portion of the Italian territory with an east–west oriented elongated shape with approximately 240 km length, for a total area of around 5400 km2 (Fig. 1). The Ligurian territory is divided into 4 provinces (La Spezia, Genova, Savona, and Imperia) and 235 municipalities. The morphology of this region is very peculiar as elevation can change in very few kilometers from the beach to mountains higher than 2000 m (Fig. 1a). The Liguria region is characterized by approximately 35% of hilly territory, only 2.5% of limited plains close to the sea and mainly by mountainous area (around 62.5%) comprising the Ligurian Alps, the Ligurian pre-Alps, and the Marguareis Alps (with the highest Mt. Saccarello, 2200 m a.s.l.).

Fig. 1
figure 1

Investigated area: a digital elevation model, b mean annual precipitation and localization of rain gauges, c lithology, and d alert zones of the Liguria region

The mean annual precipitation (MAP) of the Liguria region, obtained from available rain gauges data (described in the “Rainfall database” section), shows a close correlation between rainfall and orography, since the lower rainfall values are distributed along the coast, with a minimum of about 600 mm/year registered in the west coast, while the highest precipitation values, with a peak of 2600 mm/year, are concentrated in the Ligurian Alps and pre-Alps (Fig. 1b).

Geologically (Fig. 1c), the territory is mainly composed of metamorphic rocks and metaconglomerates, arenaceous and calcareous flysch, and recent deposits (Giammarino et al. 2002).

Due to these meteorological conditions and the geomorphological setting, Liguria is one of the regions most affected by landslides in Italy (Guzzetti et al. 2004; D’Amato Avanzi et al. 2011; Cevasco et al. 2013, 2015; Giordan et al. 2017; Pepe et al. 2019; Calvello and Pecoraro 2018; Franceschini et al. 2022b).

The DPCM of February 27, 2004, divides the study area into 5 AZs, from Ligu-A to Ligu-E (Fig. 1d) based on the hydrogeological, climatic, and geomorphological characteristics of the territory, as well as on the administrative boundaries of municipalities. This criterion is used by the civil protection authority to issue the alerts, and for this reason, this work provides a specific rainfall threshold system for each AZ.

Landslides inventory

Traditionally, methods for setting up landslide inventories include remote sensing methodologies (Bianchini et al. 2018; Solari et al. 2020), photo interpretation and field surveys (Brunsden 1985), data retrieval from technical reports (Guzzetti et al. 2008; Kirschbaum et al. 2010; Vennari et al. 2014; Rosi et al. 2019; Collini et al. 2022), or a combination of these (Dikau et al. 1996; Rosser et al. 2017). All these traditional approaches are usually quite accurate, but time-consuming, and could be very costly. Landslides detection by deep learning (Catani 2021; Bhuyan et al. 2023; Nava et al. 2022; Meena et al. 2021) is an innovative approach that allows the collection of multi-temporal inventories starting from satellite images rapidly, automatically, cheaply, and with high accuracy, but also identifying such landslides occurred in remote areas, with no impact on citizens or buildings.

Data mining is another recently developed technique used to obtain information related to natural hazards from online newspaper articles (Battistini et al. 2013, 2017; Kreuzer and Damm 2020). In fact, several studies have verified that mass media are generally the first and primary source of information about hazards for the public (Fischer 1994). Social media report a natural disaster much faster than observatories (Battistini et al. 2013; Goswami et al. 2018; Franceschini et al. 2022a). Moreover, the use of this technique implicitly discards the landslides that had no impact on society, because they occurred in isolated areas, without threatening buildings, infrastructure, or crops. Even if this characteristic may be a downside in some applications where the completeness of inventories is important, in operational civil protection applications, this feature could represent an advantage. Landslides occurred without impacting society, if identified by a forecasting model, would actually result in false alarms, since they would not be reported to the authorities. Conversely, using landslide news from social media would allow to calibrate the forecasting model specifically to identify landslides associated with a significant level or risk or significant impacts on society, which represent relevant events for civil protection purposes.

SECaGN is a data mining technique developed by Battistini et al. (2013), based on a mechanism for acquiring Internet news related to natural hazards, considered a source of data on landslide events. SECaGN is applied within Google News, as it considers national and local newspapers more comprehensively (Franceschini et al. 2022b). Through publication date and places toponyms, each news item is dated and geolocated with a fully automatic procedure.

The landslide news dataset used to calibrate the thresholds in this study was deeply analyzed by Franceschini et al. (2022a, b) for the period from 2010 to 2019 for the whole Italian territory. A total of 1218 online newspaper articles were collected from Google News, considering only those within the Liguria region and with high spatial and temporal accuracy. The reporting day and/or localization of each landslide has been verified and, if necessary, adjusted manually using the information obtained from the news. In addition, only rainfall-induced events were considered, excluding other types of landslides (such as rockfalls and anthropic-induced landslides). Overall, 315 landslide events were used for rainfall thresholds calibration.

For model validation, an independent dataset has been collected from SECaGN, for the period from 2020 to 2021. It has been analyzed and manually classified through the method adopted by Franceschini et al. (2022b). For the study area, 515 articles have been harvested, referring to rainfall-induced landslides with high spatial and temporal accuracy. A total of 45 landslide events have been identified for the validation phase.

Rainfall database

Italy is furnished with dense network of rain gauges consisting of more than 4500 stations spatially distributed throughout the entire territory, with a total of 115 rain gauges located in the Liguria region (Del Soldato et al. 2021).

For this study, hourly records were derived from the rainfall database for the period 2010–2021. Some rain gauges present few time gaps, where no data have been recorded, mainly due to instrument malfunctions; however, they do not represent a limitation of the proposed statistical analysis, because the technique used for the realization of the rainfall thresholds (MaCumBA, fully explained in the “2D rainfall thresholds” section) allows multiple rain gauges to be associated with each landslide within a specific search radius, providing the missing records. The rainfall records were analyzed to remove noise and errors and identify gaps of data, for example, negative rainfall values or values higher than 400 mm/h, which are obvious erroneous rainfall measurements.

Alternative validation datasets

Before implementing rainfall thresholds into operational warning systems, the best practice, widely acknowledged in the literature, involves a thorough validation. However, this practice is not yet fully established due to the limited availability of data (Segoni et al. 2018a). Typically, the primary focus is on calibration, and only a small percentage of data is used for validation purposes. This approach is justified by the need for a strong statistical foundation for defining thresholds, and for research purposes, testing them on a limited dataset may be considered satisfactory in confirming their validity. Nevertheless, for operational early warning applications, it is crucial to demonstrate the thresholds’ reliability with a high level of accuracy for future predictions. The use of a few years of data for validation does not provide a precise assessment of the threshold performance, thereby making problematic their implementation within an operational warning system (Gariano et al. 2015).

Due to limited data availability, this study employs rainfall and landslide data from 2020 to 2021 for classical validation purposes. To enhance the robustness of the validation phase, an innovative analysis is proposed, which involves the use of two alternative datasets of different sources.

The first dataset is composed of the alerts issued by the existing early warning system in use in the region. A comparison with this data allows determining whether the proposed thresholds offer improvements or reveal potential limitations. Currently, the Liguria region employs a weather-based LEWS. In practice, the same level of criticality calculated for meteorological hazard is also attributed to cascading hazards like landslides (https://www.regione.liguria.it/homepage-protezione-civile. last accessed on 20 June 2023). The past alerts issued by the region were provided by the Regional Environmental Protection Agency of Liguria (ARPAL), from 2015 to 2017, and from the GitHub repository of the Italian National Department of Civil Protection for the period 2018–2021 (https://github.com/pcm-dpc/DPC-Bollettini-Criticita-Idrogeologica-Idraulica).

The second dataset is the national-level civil protection emergencies collected by Gatto et al. (2023). For Italian legislation, national-level emergencies are extremely critical events that require extraordinary measures and nationwide coordination due to high magnitude, severity of impacts, and spatial extension. Therefore, a comparison with this dataset allows to verify if the model acted correctly during critical meteorological events, in particular how thresholds responded to large-scale and high-severity phenomena. The ground effects of these emergency events are manifold, including floods, landslides, and huge damages to infrastructures, private properties, and population. These types of events could impact small or large territories in a limited time period or could last up to months, and according to Gatto et al. (2023), sometimes it is not possible to narrow down the start and end dates of these events, resulting in scarce temporal detail. Moreover, commonly not every AZ is hit during an event, meaning that some areas could not be damaged or included in a national-level emergency state. This results in events with poor temporal detail, such as the emergency related to Vaia storm, issued for an entire month on which several regions were impacted at different times, including Liguria, and also containing days without precipitation. More in general, the longer the period reported the scarcer the temporal detail, the higher could be the number of FAs or NAs (non-alarms, rainfall does not exceed the threshold and no landslides are reported) not directly related to the main event. From 2013 to 2022, Liguria was hit 11 times by events which required the declaration of a national-level emergency state, on which this validation was performed.

Rainfall threshold analysis

2D rainfall thresholds

The threshold analysis was performed using MaCumBA, a software developed by Segoni et al. (2014a, b) and tested in several case studies in Italy (Rosi et al. 20152021; Segoni et al. 2014a, b) and in other countries (Rosi et al. 2016, 2019). MaCumBA provides a semi-automated, fast, and objective analysis and allows the establishment of a multiple thresholds system for the identification of three levels of increasing criticality.

The thresholds defined by MaCumBA are based on the general power law first proposed by Caine (1980):

$$I=\alpha {D}^{\beta }$$

where I represents the rainfall intensity (mm/h) and D is the duration of the rainfall (h); α, the intercept on the y-axis (> 0), and β, the angular coefficient of the line (< 0), are empirical parameter characteristics of the rainfall data distribution.

One of the peculiarities of MaCumBA is the definition of an additional parameter to characterize the thresholds, called No-Rain-Gap (NRG). It expresses the number of consecutive hours without rain that is required to consider a rainfall event ended. This parameter plays a key role, ensuring the replicability of the analysis and facilitating the implementation of thresholds in an operational warning system (Segoni et al. 2014a, b).

MaCumBA procedure can be summarized into three main phases:

  1. 1.

    Identification of rainfall events and definition of I and D parameters for each event for each rain gauge

  2. 2.

    Selection of the most appropriate rain gauge for the characterization of each landslide, choosing the rain gauge, within a certain search radius from the landslides, with the most complete time series related to each event

  3. 3.

    Choice of the most representative I-D threshold using a 95% confidence interval and plotting it in a log–log graph

The identified threshold is used to separate the ordinary level, which represents the absence of criticality, from the low criticality level. Then, it is translated upward to calibrate two higher thresholds representing the limits of the moderate and high criticality warning levels. In this study, a site-specific calibration procedure to define the moderate and high criticality thresholds is proposed, based on the number of FAs that exceed these thresholds. We assume that the social and economic costs of activating the civil protection system for a moderate criticality alert are so onerous that no more than one FA on average per year is justified; therefore, for the 10-year calibration period, the moderate criticality threshold was defined by moving up the low criticality threshold until only about 10 FAs would be committed. Considering that a FA above the high criticality threshold weighs heavily on the civil protection system, both economically and in terms of confidence in the warning system itself by the population, to define this threshold, the shift ended when no FAs were committed (Segoni et al. 2022). In this way, a low number of FAs is guaranteed for the moderate and high criticality thresholds, making them perfectly suitable for operational purposes and much more sustainable by the civil protection system, without affecting the predictive capabilities of the thresholds.

The proposed methodology was applied for the calibration period, from 2010 to 2019, to define a landslide warning system on three levels of criticality for each AZ of the Liguria region. Then, the thresholds have been validated with an independent dataset of 2 years, from 2020 to 2021, to verify if the calibration criterion based on the number of exceedances without a landslide is effective.

3D rainfall thresholds

Although the major strength of rainfall thresholds is their simplicity, recently, many authors are attempting to include other parameters to better reproduce cause and effect relationships between rainfall and landslides, and among the most investigated parameters, they are using soil moisture or other proxies to indirectly take it into account (Chen et al. 2017; Segoni et al. 2018b; Zhao et al. 2019; Rosi et al. 2021; Kim et al. 2021). In fact, it is widely accepted in traditional literature that even short-term, low-intensity rainfall can trigger landslides if it occurs during a wet season, therefore, in an already partially saturated soil (Nocentini et al. 2023).

To account for both triggering and predisposing rainfall, we built on the methodology proposed by Rosi et al. (2021) to get an optimized 3D threshold system that reduces false alarms without reducing the number of hits (correct predictions of landslides). These 3D thresholds are defined by coupling classical intensity-duration rainfall thresholds (defined in the x- and y-axes of a Cartesian plane) with a new rainfall parameter (considered in the z-axis), called optimized mean areal rainfall (OMAR), which was defined as follows.

For each rainfall event, the cumulative rainfall measured in some reference long-term period was identified for each rain gauge of the same alert zone. The reference periods considered in this study were 7, 15, 30, and 60 days long. For each event, the antecedent rainfall of all the rain gauges was averaged obtaining a single value, characteristic of the whole AZ (OMAR): OMAR7, OMAR15, OMAR30, and OMAR60. OMAR can be considered an indicator of how a rainstorm event is widespread in the territory, which is a key feature of meteorological events in civil protection hazard management, as it affects the capability of local administrations to face it and take adequate countermeasures. For each reference long-term duration, all landslide-triggering event events were plotted together with non-triggering ones to identify the OMAR value equal to the minimum cumulative rainfall obtained for the landslide events correctly identified by the 2D thresholds. If this OMAR value is used as the 3rd threshold, only non-landslide below the 2D threshold (false alarms) is filtered out. The last step of the procedure is to identify which OMAR (OMAR7, OMAR15, OMAR30, and OMAR60) filters out the highest number of false alarms, without reducing the number of correct predictions. That would be selected as the OMAR and will be used in the z-axis of the 3D threshold.

Results

Through the procedure summarized in the previous section, the thresholds were obtained for each AZ of the Liguria region. Table 1 shows the I-D threshold parameters for the low criticality level, for each AZ, obtained by applying MaCumBA. Indeed, equations and performance for calibration phase are reported in Table 2.

Table 1 I-D thresholds parameters for the low criticality threshold of each AZ
Table 2 I-D thresholds performance for each criticality level and AZ, for calibration dataset

In order to further improve the effectiveness of the proposed 2D thresholds, a third parameter (OMAR) was introduced to add a third dimension to the thresholds, resulting in a reduction in the number of FAs for each criticality level. Table 3 provides a summary of the results obtained during the calibration phases of the 3D thresholds for each AZ; instead, Fig. 2 illustrates an example of a 3D threshold obtained for Ligu-C.

Table 3 Summary table of 3D thresholds results obtained for calibration phase
Fig. 2
figure 2

Example of 3D rainfall thresholds obtained for the Ligu-C Alert Zone for calibration phase: a 3D view and b profile view (x–z plane)

Discussions

The threshold equations in each AZ (Table 1) exhibit significant variations. These differences can be attributed to various physical setting of the study area, such as the rainfall patterns and the geomorphological and lithological characteristics, as well as to the number of available landslides and rain gauges (Rosi et al. 2017). For each criticality level, Ligu-A shows the lowest 2D thresholds compared to the rest of the region. This is primarily attributed to the low MAP levels of this AZ (Fig. 1b). Instead, in the rest of Liguria, the presence of a complex orography in the proximity of the coastline promotes the development of more severe rainfall events (Furcolo et al. 2016). These phenomena are becoming increasingly frequent, particularly in the central and eastern zones of the region, due to climate change (Libertino et al. 2018). Such elevated levels of rainfall result in higher thresholds.

As shown in Table 2, the number of FAs is more than double the total number of CAs for each AZ, except for Ligu-B where the number of FAs is still higher. These results are not satisfactory, especially if these thresholds are to be used for operational early warning. The proposed calibration method for moderate and high criticality thresholds, which allows for a maximum of 1 FA/year above the moderate threshold and 0 FA/year above the high threshold, is largely respected for each AZ, and effectively mitigates the impact of these errors on the proposed operational warning system.

The same trend showed for 2D threshold of Ligu-A is observed also for the OMAR thresholds, which exhibit the lowest values of the region. During the calibration of the 3D thresholds, the highest number of filtered FAs is obtained when using OMAR values calculated over 15 days for most AZs (from Ligu-B to Ligu-E, for the 15-day period, the number of FAs has been reduced to the minimum: 71, 100, 28, and 16, respectively), except for Ligu-A, where a 60-day period leads to one additional FA being filtered with respect to the 15-day one (167 remaining FAs instead of 168). In order to achieve greater consistency across all AZs, it was decided to consider the 15-day cumulative threshold as the optimal configuration, including for Ligu-A, despite this one additional FA accounted in this case. Table 3 highlights the significant reduction in the number of FAs at the low criticality level, with a peak of 82 filtered FAs in Ligu-C (from Ligu-A to Ligu-E, there is a reduction in the number of FAs at the low criticality level by 10%, 27%, 45%, 67%, and 83%, respectively). Moreover, there is an important reduction in FA classified at the moderate criticality level, with a peak again for Ligu-C, where 8 FAs were filtered, thereby further mitigating the impact of these errors on the civil protection system (from Ligu-A to Ligu-E, there is a reduction in the number of FAs at the moderate criticality level by 30%, 13%, 80%, 67%, and 60%, respectively). The OMAR thresholds in Ligu-D and Ligu-E are higher compared to the other AZ, which is in accordance with the considerably higher levels of MAP with respect to the other AZs (Fig. 1b).

The excellent results achieved in the calibration phase were further confirmed by validating the 2D and 3D thresholds, utilizing an independent landslide database for the period 2020–2021 obtained by SECaGN. The outcomes of validation phase for 2D rainfall thresholds are presented in Table 4, while the results for 3D thresholds’ validation are shown in Table 5. Figure 3 illustrates an example of the results obtained for the validation of the 3D threshold of Ligu-C.

Table 4 Summary table of 2D thresholds results for validation phase
Table 5 Summary table of 3D thresholds results for validation phase
Fig. 3
figure 3

Example of 3D rainfall thresholds obtained for the Ligu-C alert zone for validation phase: a 3D view and b profile view (xz plane)

The validation of the 2D thresholds (Table 4) demonstrated that the chosen criterion for handling the FAs is largely respected, with the exception of one single event. In Ligu-A, the rainfall event of 15/08/2020, with a duration of 2 h and particularly intense (26.8 mm/h), results above the high criticality threshold, but no landslides were reported following this event. This occurrence represents the only FA above the highest criticality threshold in 12 years of simulation throughout the Liguria region. The number of MAs remains relatively low in each zone, with a maximum of 7 in Ligu-C. Despite the implementation of the third dimension in each threshold, it was not possible to eliminate the FA at the high criticality level observed in 2020 for Ligu-A (Table 5). In addition, in Ligu-D, the validation of the 3D thresholds effectively filtered out 4 FAs at the low criticality level, but also removed a landslide event correctly identified by the 2D thresholds. In this AZ, which experiences heavy precipitation and has a higher OMAR value, the inadvertent removal of CAs is more likely due to the 3D threshold. Overall, with only 2 significant errors observed over a 12-year analysis period, it is evident that the proposed model demonstrates a promising performance.

The capability of the OMAR threshold to differentiate between ordinary rainfall conditions and those causing slope instability is evident in the removal of many FAs while preserving all correct alarms (except one), as compared to the classic I-D threshold. This outcome can be interpreted as a capability of the proposed system to account simultaneously of triggering rainfall (expressed by rainfall intensity) and other rainfall prerequisites (expressed by OMAR) that in the study area are not sufficient, alone, to trigger a landslide but can set hazardous hydrological preconditions on the hillslopes. Those preconditions are usually addressed by rainfall thresholds based on antecedent rainfall accumulated over relatively long time periods or by antecedent precipitation indexes (Glade et al. 2000; Vessia et al. 2014; Zhao et al. 2019; Lazzari et al. 2020; Rosi et al. 2021; Kim et al. 2021). The main novelty and strength of the proposed approach is that, while usually rainfall thresholds are based either on antecedent rainfall indexes or on peak intensity rainfall, the 3D thresholds used in this study pursue both strategies at the same time.

The results of the comparison between the proposed thresholds and the alerts issued by the current warning system used in Liguria are shown in Table 6. It is evident that the proposed 3D thresholds have led to a significant reduction in the number of FAs. The zone with the highest reduction in FA is Ligu-E, where 133 FAs have been removed. In Ligu-A, there are 23 more FAs at the low criticality level, but 7 fewer FAs at the moderate criticality level and 1 fewer FA at the high criticality level. In total, 345 FAs have been removed, of which 97 are at the moderate or high criticality level. Therefore, the implementation of the proposed methodology drastically reduces the economic and social impact of these errors on the civil protection system. Additionally, an increase in CA was observed. In total, the events correctly predicted by the proposed thresholds are 98 more than those predicted by the current warning system, including 40 events at the moderate or higher criticality level.

Table 6 Results of the comparison between the LEWS currently in use in Liguria and the proposed 3D thresholds

The results obtained by the comparison with the national-level emergency dataset are shown in Table 7. In order to access the dataset provided by Gatto et al. (2023) and obtain additional information about each event, it was inserted into the table the field “Emergency state code,” containing the same code assigned to each event by the author. The empty cells indicate the AZs that are not involved in the state of emergency.

Table 7 Results of the validation of the proposed 3D thresholds with the national-level emergency events dataset collected by Gatto et al. (2023). The empty cells show the AZ not involved in the emergency state

During all national emergency events analyzed, threshold exceedances have been recorded. Most of the exceedances results in CA, namely 40 in total, of which 30 exceeded the moderate criticality threshold and 15 the high criticality threshold. Only 20 FAs have been registered, of which only 5 as moderate criticality and zero as high criticality over 8 years across the entire region. In Ligu-B, the only AZ involved in all 11 analyzed events, no FAs have been recorded, while Ligu-A and C have registered only 3 and 4 FAs, respectively, with low criticality.

There is only one emergency event that registered more FAs than CAs, the 2016_05, with 0 CA and 1 FA. The report for this event published by ARPAL explains that the ground effects observed were primarily due to strong wind, causing significant damage to the railway and road network, but the associated rainfall was moderate, and only a few landslides were reported (https://www.arpal.liguria.it/contenuti_statici/pubblicazioni/rapporti_eventi/2016/Report_speditivo_20161014_vers08nov.pdf, last accessed on 20 June 2023). This clarifies why there were only a few threshold exceedances in this case.

As previously explained, these national-level emergency events have a duration of several days, in some cases even months (Vaia storm, code 2018_06). In these cases, it is possible that multiple exceedances, including FAs and NAs, were observed during the emergency, even if they are not directly linked to the primary event of the emergency. These exceedances should not be considered missed responses of the model to these highly critical events, because they may be related to situations with no or low precipitation, therefore well recognized by the model.

Ligu-D and Ligu-E are frequently excluded from these emergency states because they did not suffer significant damages. In fact, they are the areas most affected by high rainfall but with a low population density (https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Population_and_housing_census_2021_-_population_grids&stable=1, last accessed on 20 June 2023); therefore, a lower impact on society may be observed even in case of similar rainfall rates. Ligu-A is also often excluded because it is the least rainy area of the region, leading to lower hydrogeological risks.

Conclusion

In this study, a specific set of three-dimensional rainfall thresholds has been developed for each AZ of the Liguria region. The main objective is to optimize these thresholds to m13ake them suitable for operational warning purposes. To achieve this goal, the work focuses on addressing two major challenges that hinder the implementation of thresholds in a LEWS: the high occurrence of FAs and the limited statistical reliability of incomplete validation phases.

A set of intensity-duration thresholds has been calibrated to minimize the number of FAs, allowing for a maximum of one FA per year above the moderate criticality threshold and zero above the high criticality threshold. Additionally, the number of FAs has been further reduced by introducing an additional parameter (optimized mean areal rainfall, OMAR) through an innovative procedure, resulting in a three-dimensional threshold system optimized to reduce false alarms as much as possible without reducing the number of landslide events correctly identified. In addition, these 3D thresholds allow to account at the same time for both triggering and predisposing rainfall, thus providing a more complete representation of the physical mechanism of landslide triggering, while maintaining a simple and easy to implement methodology. Despite the positive outcomes, it should be remarked that like any other empirical approach, the proposed methodology is very site specific. Therefore, we suggest applying the presented 3D rainfall threshold approach on areas characterized by geomorphological and climatic homogeneity. In perspective, the method can be applied also on wider areas (e.g., an entire nation), provided that the area is divided into a mosaic of smaller and relatively homogenous alert zones to be analyzed and calibrated independently. In this work, the Liguria region was divided into five alert zones; wider and more heterogenic areas would require a higher number of subzones to obtain an effective threshold system.

The high predictive capabilities of the obtained thresholds were demonstrated through a robust multi-source validation process, which involved a classical validation phase with an independent landslide dataset, followed by a second phase that included two different comparisons: one with the alert issued by the LEWS currently in use in the region and another using a dataset of national-level emergencies. These comparisons allowed for testing the thresholds for an operational context, identifying their strengths and weaknesses, and simulating a real situation. This methodology allows researchers and administrative operators to make more objective decisions regarding the implementation of the most reliable thresholds in a LEWS. Moreover, the proposed method can be easily replicated, serving as a valuable alternative in cases where data scarcity hinders reliable validation.