Introduction

Weather-induced shallow landslides affecting superficial loose soil deposits with depth up to 2÷3 m are a global natural phenomena occurring in many regions of Italy (Calvello and Pecoraro 2018), in Norway (Krøgli et al. 2018), in Switzerland (Springman et al. 2013; Fan et al. 2017), in Austria (Dietrich and Krautblatter 2017; Canli et al. 2018), in the USA (Baum and Godt 2010; Staley et al. 2013; Mirus et al. 2018), and in many parts of China (Lin and Wang, 2018; Zhang et al. 2018). The future frequency and the intensity of these weather-induced landslides are anticipated to increase due to climate change and the continuous urbanization of areas highly exposed to landslide risk (Barla and Antolini 2016). The main triggering mechanism for these landslides typically consists of rain or snow infiltration in shallow soil layers, resulting in an increase of the pore water pressure and a decrease of the soil shear strength (Caine 1980), as well as in the reduction of the matric suction in partially saturated soils (Sorbino and Nicotera 2013). This triggering process is characterized by a near total absence of warning signs. Although the mobilized volume is often limited in the triggering phase, in the propagation phase, the landslide mass may incorporate the soil material lying along the slope, significantly increasing its volume; and in steep channels, shallow slides may evolve into debris flows, destructive phenomena characterized by extremely high velocities (Hungr et al. 2014). Shallow weather-induced landslides can occur frequently and often simultaneously over large areas; thus, they represent a widespread risk for local communities, structures, and infrastructures in many parts of the world (Petley 2012; Kirschbaum et al. 2015; Haque et al. 2016; Froude and Petley 2018).

Operational landslide early warning systems (LEWS) aim to reduce loss-of-life by informing exposed individuals, communities, and organizations, in order to enable them to act appropriately and in sufficient time (UNISDR 2006). LEWS have been recognized as important tools for landslide risk reduction and community resilience in many recent international initiatives (e.g., Sendai Framework for Disaster Risk Reduction 2015–2030, UN Agenda 2030 for sustainable development, European Climate Adaptation Platform). LEWS can be designed and managed at two different scales of operation: local systems (Lo-LEWS) address single landslides at slope scale (Pecoraro et al. 2019a); territorial systems (Te-LEWS) deal with the possible occurrence of multiple landslides over wide areas at regional scale (Piciullo et al. 2018). Te-LEWS are used to provide generalized warnings to authorities, civil protection personnel, and the population over appropriately defined homogeneous warning zones. Typically, these systems address weather-induced landslides through the monitoring and prediction of meteorological parameters (Segoni et al. 2018). It should be noted that the term weather is herein preferred to the term rainfall to refer to the broad spectrum of weather atmospheric time-dependent variables that may trigger landslides, such as intensity, duration, antecedent, and cumulated precipitation; snow cover and snow melting associated with temperature changes; and evapotranspiration and rainfall infiltration rates associated with atmospheric and soil conditions.

Meteorological monitoring alone does not consider critical soil properties controlling the initiation of the triggering process. Depending on these conditions, landslides may be triggered in response to a large variety of weather events. Therefore, although the integration of geotechnical monitoring (e.g., pore water pressure, soil water content, ground deformation) within warning models for weather-induced landslides may be very challenging, it can provide additional information to determine the likelihood of rainfall events actually triggering failure (Staley et al. 2013). On this issue, Stähli et al. (2015) proposed to combine local measurements of landslide precursors with model predictions at catchment scale. Recently, Calvello (2017) and Pecoraro et al. (2019b) suggested that local observations may provide fruitful indications on the local conditions within a wider warning zone. However, a standardized methodology for integrating geotechnical observations collected at local scale in a territorial warning model for weather-induced landslides still does not exist.

This study attempts at improving the prediction accuracy of a territorial warning model for weather-induced landslides using in-situ pore water pressures data, and not only widespread hydrometeorological data. To this purpose, the proposed procedure is tested on the Norwegian national LEWS between January 2013 and June 2017, considering 30 hydrological basins as test areas.

Methodology

The conceptual framework

The procedure presented herein aims at integrating widespread meteorological monitoring data already adopted in a territorial warning model (TeWM) within an augmented territorial warning model (ATeWM) using local pore water pressure measurements. It builds upon the “strengths” of a territorial warning model already operative, which is based on widespread hydrometeorological data (i.e., relative water supply and relative soil saturation), and combines this with data from distributed positive pore water pressure measurements. To this aim, multiple in-situ measures must be available within any considered warning zone. Indeed, this ensemble of measures, processed and taken together as described in “Definition of the augmented territorial warning model” section, should be considered a proxy of significant weather-induced changes in the groundwater regime. The latter relates to possible shear strength changes along potential failure surfaces, and thus to possible variations in the factor of safety of potentially unstable soil volumes. The proposed procedure comprises three successive steps (Fig. 1): identification of the Representative Territorial Units, RTUs (phase I), application of the territorial warning model (phase II), and calibration and validation of the augmented territorial warning model (phase III). In phase I, the area of analysis is divided into territorial units of appropriate extent (i.e., the warning zones adopted in the model) considering the scale of analysis, so that information from widespread monitoring and local observations can be combined. The territorial units are classified using two criteria: the occurrence of landslide events during the period of analysis and the availability of relevant information from monitoring instruments in the proximity of the landslide source areas. Following this classification, RTUs are identified. In phase II, the territorial warning model is applied in the selected areas. For each RTU, the warning events are identified, and the warning levels defined. In phase III, the calibration and the validation of the augmented territorial warning model are performed starting from the warning events issued by the territorial model. For calibration purposes, the pore water pressure observations are preliminarily analyzed to determine potential upward or downward trends. Then, the most appropriate parameters are identified, and the model is validated using statistical indicators derived from contingency tables.

Fig. 1
figure 1

Flowchart of the methodology proposed for the definition of an augmented territorial warning model for weather-induced landslides

Definition of the augmented territorial warning model

Pore water pressure measurements are analyzed to determine significant uptrends (and downtrends) that indicate local conditions which may lead (or not lead) to the triggering of a landslide within a territorial unit during a given weather event. Pore water pressure records are typically characterized by significant short-term variability and consequently, need to be statistically processed in order to smooth the short-term fluctuations and make potential trends identifiable. Moving averages are simple and common smoothing techniques widely used in time series analysis. They determine new time series, whose values are the average of a given number of observations in the original time series. Several types of moving averages—e.g., simple moving average (SMA), cumulative moving average (CMA), weighted moving average (WMA), and exponential moving average (EMA)—can be used, depending on the purpose of the analysis, the types of data, and the time periods.

In this study, the SMA of the recorded pore water pressures at a given day (ui) is calculated for each piezometer, over the number of days of a specified time period (n), as follows:

$$ {u}_i=\sum \limits_{k=i-n+1}^i\frac{p_k}{n} $$
(1)

where pk is the pore water pressure recorded at day k. Using these new time series, two indicators of pore water pressure variations are defined (Fig. 2), as follows:

$$ \varDelta {u}_i={u}_i-{u}_{i-n} $$
(2)
$$ \varDelta {u}_i^{\ast }=\frac{{\Delta u}_i}{{\Delta u}_{\mathrm{imax}}} $$
(3)

where Δui is the difference between the SMA calculated considering a length equal to n days and referring to days i and i–n; and Δui* is the same difference normalized by Δuimax, the maximum difference calculated over the time frame where pore water pressure records are available for each piezometer (L1, L2, and L3 for the three piezometric datasets reported in Fig. 2).

Fig. 2
figure 2

Example of application of the proposed methodology to a test area: identification of weather-induced landslides and available piezometers (a) and pore water pressure trend analysis before a landslide event (b)

The augmented territorial warning model is based on a 2-step procedure developed using the above-defined indicators (Fig. 3). In the first step, the differences between the simple moving averages referring to days i and i–n are evaluated. In case they do not show a clear trend (i.e., the number of downtrends is equal to the number of uptrends in the datasets), the warning level issued by the territorial model (WLte) remains unchanged. Otherwise, a second step is performed wherein the mean of the normalized simple moving average differences is calculated considering all the available piezometers, and this ensemble indicator is compared with predefined thresholds (i.e., a lower threshold, LT and an upper threshold, UT). Three final outcomes are possible: the warning level remains the same or is increased or decreased. No more than two warning level variations are allowed.

Fig. 3
figure 3

Scheme of the procedure developed for analyzing pore water pressure observations. The numbers to the right indicate the change from the original warning level (WLte) due to the augmented territorial warning model

The calibration and the validation of the augmented territorial warning model are performed using statistical indicators and following procedures widely adopted in literature (e.g., Martelloni et al. 2012; Gariano et al. 2015; Calvello and Piciullo 2016). In a preliminary phase, the warnings issued and the landslides occurred are reported in a 2 × 4 correlation matrix, where the rows indicate the occurrence of the landslides (yes, no) and the columns, the corresponding occurrence of the warning levels (WL1, WL2, WL3, WL4). It should be noted that the relative importance assigned to the different types of errors by the system managers is a key issue to consider to properly validate the model. Figure 4 reports a graphical representation of a comprehensive analysis of the correlation matrix based on a set of two performance criteria, both assigning a meaning to all the elements of the matrix.

Fig. 4
figure 4

Correlation matrix adopted to calibrate and validate the augmented territorial warning model: alert classification (a) and grade of correctness (b) performance criteria used for the analyses (modified from Calvello and Piciullo, 2016)

The “alert classification” criterion (Fig. 4a) employs a classification scheme derived from a standard contingency table, and identifies correct alerts (CA), false alerts (FA), missed alerts (MA), and true negatives (TN). The presence of one of the three highest levels of warning (WL2, WL3, and WL4) concurrently with the occurrence of at least one landslide is assumed as CA. FA and MA are incorrect predictions of the system: the first is related to the issuing of any of the three highest levels of warning (WL2, WL3, and WL4) and the simultaneous absence of a landslide; the second refers to the occurrence of a landslide without any warning (WL1). TN represents the absence of both landslides and warning occurrences. The “grade of accuracy” criterion (Fig. 4b) assigns a color code to the components of the correlation matrix in relation to the agreement between a given warning event and a given landslide event. For instance, if one of the two highest WLs is issued (i.e., WL3 or WL4) and no landslides occur, this could be considered a significant error of the warning model. Using this criterion, the elements are classified in three color-coded classes, as follows: green (Gre) for the elements which are assumed to be representative of the best model response, yellow (Yel) for elements representative of minor model errors, red (Red) for elements representative of significant model errors. Starting from the two performance criteria, several performance indicators can be derived (e.g., Piciullo et al. 2017a). Table 1 lists the indicators considered in this study.

Table 1 Performance indicators used for the analyses

Case study

Norwegian national LEWS

Norway is divided into 18 counties and 422 municipalities and covers an area of ∼ 385,000 km2 on the western and northern part of the Scandinavian Peninsula. Approximatively 30% of the land area consists of mountains (with an average elevation of 460 m a.s.l.) and 6.7% of the country is covered by steep slopes. In geological terms, Norway is situated along the western margin of the Baltic shield covered by Caledonian nappes in the west. The bedrock of the Baltic shield is dominated by Precambrian basement rocks (e.g., granites, gneisses, amphibolites, and meta-sediments) in the southern and south-eastern part of the country (Fredin et al. 2013). Continuous till deposits cover large areas of the valley sides and floors, although fluvial and glaciofluvial deposits as well as marine clays are widespread.

Steep slopes, the presence of weak, loose sediments and favorable climatic conditions provide a basis for the triggering of a range of deep and shallow weather-induced landslides. These landslides are typically triggered by rainfall and snowmelt, or their combination, resulting in intense or long-duration water supply and high soil water content. Although the steep slopes covered by quaternary loose sediments are highly susceptible to landsliding, failures can also occur in gentle slopes covered by snow as well as in embankments along roads and railways. In addition, some failures are either triggered from or initiated as rockfalls or slush flows and may develop into debris flows as they propagate downslope (Piciullo et al. 2017b). There are limited comprehensive estimates of human and economic losses related to shallow landslides in Norway (Krøgli et al. 2018). According to Furseth (2006), at least 230 fatalities can be associated to slope failures over the last 500 years. A recent report prepared by Haque et al. (2016) showed that 12 people died in Norway in the period 1995–2016 because of weather-induced landslides (slush flows in 7 cases and debris flows or debris avalanches in the remaining 5 cases). Economic consequences are mainly associated with disruption of road and railway networks. According to the Intergovernmental Panel on Climate Change (Hanssen-Bauer et al. 2017), the number of annual landslide events in Norway is expected to increase as Northern Europe will probably experience higher intensity and frequency of heavy precipitation in the future.

In 2009, the Norwegian Water Resources and Energy Directorate (NVE) started developing a national LEWS as part of a national program for landslide risk management; the system was officially launched in the autumn of 2013. The service is operative year-round performing daily regional scale landslide hazard assessments (i.e., for a county and/or group of municipalities). Predictions of the system are based on forecasting hydro-meteorological conditions leading to landslide initiation (Beldring et al. 2003). The model uses rainfall and temperature as input data and simulates many hydro-meteorological parameters, such as runoff, snowmelt, groundwater, soil saturation, and frost depth. In the development phase of the LEWS, hazard threshold levels were calibrated through a statistical analysis of historical landslides and different hydro-meteorological conditions. The thresholds currently adopted in the LEWS were proposed by Colleuille et al. (2010), by combining simulations of water supply (rainfall and snowmelt) and soil water content (both expressed as relative values normalized to annual averages and maxima over a 30-year reference period, respectively). Thresholds were statistically derived from empirical tree classification considering 206 landslides events from different parts of the country. Piciullo et al. (2017b) analyzed the warnings issued in Western Norway in the period 2013–2014, confirming an overall good performance of the adopted thresholds. Finally, it is worth mentioning that decision-making procedures are based upon hazard threshold levels that are supported by hydro-meteorological and real-time landslide observations as well as landslide inventories and susceptibility maps (Devoli et al. 2018).

Data

Data on landslide occurrences were retrieved from the national landslide database (www.skrednett.no), which contains more than 60,000 entries (represented by point locations) covering the whole country considering the last 500 years. A relevant number of registrations (16,346) are recorded in the period of analysis, i.e., from January 2013 to June 2017, among which 1481 can be considered weather-induced landslides in loose soils. Six hundred fifty-eight of these records (44.4%) are categorized as landslides in soil, not otherwise specified due to lack of further documentation; 654 (44.2%) are classified as debris flows, debris avalanches, or mudslides; 113 (7.6%) are reported as soil slides in artificial slopes (cuts and fillings along road and railway lines); 43 (2.9%) are slush flows, and the remaining 13 (0.9%) are clay slides. Registrations recorded by road and railway authorities are usually reported as points along transportation networks, thus often far away from the source area. In addition, further uncertainties may result from errors in the classification the type of landslide, lack of spatial and/or temporal information, and double registrations. As a consequence, each landslide event recorded in the national landslide database is affected by a certain degree of spatial uncertainty, varying from “exact” to “> 1000 m.” Uncertainty on landslide location always affects catalogs used for the calibration and validation of LEWS developed at regional scale (Segoni et al. 2018). Large uncertainties exist expecially when assigning geographic coordinates to a landslide event in case multiple landslides or affected areas were referenced in one report (Kirschbaum et al. 2010). However, the location of the large majority of the records used in this study is affected by a low spatial uncertainty (tens of meters); thus, the records can be considered adequate for analyses at catchment scale.

The pore water pressure measurements were collected at local scale-analyzing data from 41,706 boreholes installed by the Norwegian Geotechnical Institute (NGI) for a variety of geotechnical projects throughout Norway not specifically aimed at early warning purposes, such as geotechnical site characterization of soils; slope stability analysis; efficiency evaluation of surface drainage works; and monitoring of road and railway embankments. The piezometers considered representative of conditions which may lead to the triggering of landslides have been selected taking into account their spatial proximity to the landslides that occurred in the period of analysis, and the installation in the shallow soil layers or in areas characterized by the presence of loose sediments, according to a quaternary deposit map at 1:50,000 scale (www.ngu.no). It should be mentioned that only data from electric piezometers have been considered, as they provide longer and more reliable data series. Pore water pressure measurements used in this study were derived from a total of 240 boreholes installed between January 2013 and June 2017.

Test areas

The Norwegian hydrological basins have been considered as the most appropriate minimum territorial units for applying the augmented territorial warning model. Indeed, the catchment scale is an intermediate scale of analysis between small areas (e.g., single slopes) and very wide areas (e.g., a municipality, a region or a nation). Within these units, both the widespread meteorological monitoring data employed in the national LEWS and local pore water pressure observations may provide meaningful information for the definition of an efficient warning model. In addition, at catchment scale, the whole process area of the landslides is automatically considered. This allows reducing the uncertainty related to the location of the considered landslides. The test areas used in this study have been identified according to two main criteria: the occurrence of shallow landslides in loose soils and the availability of a relevant number of pore water pressure measurements.

According to the criteria of selection, 30 Norwegian hydrological basins have been identified as potentially useful for the analyses. Figure 5 shows that the majority of them (16 out of 30) are distributed along the western coast of Norway: three are situated in the southern part (SW in Fig. 5), twelve in the central part (CWa and CWb in Fig. 5), and one in the northern part (NW in Fig. 5). All of them are dominated by narrow fjords and steep mountainsides and are characterized by the presence of shallow marine deposits covering weathered and altered bedrock. The remaining 14 basins are concentrated in the south-eastern part of Norway (SE in Fig. 5), an area highly prone to landslides due to long-term infiltration from large amounts of rain and/or snow in autumn and winter and presence of various shallow quaternary deposits (especially moraine materials). The main characteristics of the test areas are summarized in Table 2. The majority of the basins situated in the south-eastern part of Norway (10 out of 14) cover an area smaller than 50 km2; on the contrary, three out of the four basins with an area higher than 100 km2 are distributed along the western coast. The shallow soil layers are significantly characterized by the presence of quaternary loose deposits highly prone to landslides, which cover more than half of the catchment surface in 11 cases out of 30. A total of 125 weather-induced landslides in soils occurred in the 30 test areas between January 2013 and June 2017: in the majority of them (25 out of 30), the number of landslides varies between 3 and 5, yet four basins were impacted by 6 landslides and the remaining one by 7 landslides. Pore water pressure measurements recorded in the period of analysis were derived from 240 boreholes, whose density within the test areas varies between 7 and 14.

Fig. 5
figure 5

Location of the 30 catchments used in this study as test areas

Table 2 Name, area, presence of loose sediments, landslides, and piezometers available within the 30 test areas

Results

Application of the territorial warning model

The TeWM employed within the Norwegian LEWS has been applied to the 30 test areas identified in “Data” section. The model has been partially modified to consider the dimension of the test areas, the definition of the territorial warning events, and the meaning of the warning levels. Indeed, the Norwegian LEWS employs variable minimum territorial units, varying from a small group of municipalities (hundreds or thousands of km2) to several administrative regions (tens of thousands of km2). Thus, extent and position of warning zones are dynamic and may change day by day, depending on hydrometeorological conditions. Conversely, the minimum territorial units adopted herein are the Norwegian hydrological basins, which are predefined static warning areas ranging from few km2 to more than 100 km2. In the Norwegian LEWS, relative water supply (rain and snowmelt) and relative soil saturation/groundwater conditions are combined for the definition of a hydro-meteorological index, which is compared with statistically defined thresholds. Then, a daily qualitatively assessment of landslide warning levels is performed by an expert on duty supported by susceptibility maps and real-time observations. On the contrary, in this study, the hydrometeorological index is directly compared with the warning thresholds without any further qualitative assessment. In both cases, four warning levels are adopted: green (WL1), yellow (WL2), orange (WL3), and red (WL4), although the classification criteria are rather different. In the Norwegian LEWS, the principle behind the warning criteria is that rare hydrometeorological conditions are expected to cause more landslides and possibly higher damages. In addition, the criteria contain information on the expected number of landslides per area, as well as hazard signs indicating landslide activity. On the other hand, the classification criteria adopted in this study are not expected to be correlated to the number of expected landslides and the extension of the hazardous area in each territorial unit, but to provide an indication on the probability of landslide occurrence, i.e., null (WL1), moderate (WL2), high (WL3), and very high (WL4).

The assessment of the warning events resulting from the application of the territorial warning model within a territorial unit can be schematized into two phases. Firstly, the daily forecasts of water supply and relative soil water content are retrieved from the open-access web portal www.xgeo.no, where they are displayed as raster data at 1-km2 resolution. Successively, hydro-meteorological indexes are evaluated calculating the average values of all the grid cells covering each territorial unit. Finally, the daily average values are compared with the warning levels employed by the territorial model, to identify the days with warnings as well as to define, in these cases, the warning level. The results are reported in a 2 × 4 correlation matrix (in Table 3 displayed as a simple 1 × 8 row), which lists the landslide events (i.e., the presence or absence of a landslide) in relation to the warning events (WL1, WL2, WL3, and WL4). Twenty-six of the 125 landslide events in the test areas between January 2013 and June 2017 occurred when the warning model was in level 1, i.e., when no warnings were issued. For the other 99 occurrences, the warning model was in level 2 in 59 cases, in level 3 in 35 cases and in level 4 in the remaining 5 cases. In the period of analysis, 695 daily warnings have been issued: the majority being level 2 “moderate warnings” (551 events), 139 events being level 3 “severe warnings”, with the rest of them being level 4 “very severe warnings” (5 events).

Table 3 Results obtained applying the TeWM in the 30 test areas from January 2013 to June 2017

The 695 warnings issued by applying the territorial model were divided into two subsets: (i) a calibration set used to define the augmented territorial model, listing 457 warnings issued within 20 territorial units, and (ii) a validation set used to validate the model, listing 238 warnings issued within the remaining 10 territorial units. The two subsets have been defined with similar characteristics in terms of area of the territorial units, number of landslides, and availability of piezometers.

Augmented territorial warning model calibration and performance evaluation

The ATeWM presented in Fig. 3 has been calibrated using the 457 warnings issued by the territorial warning model. To this aim, a parametric analysis has been carried out for identifying the most appropriate time period (n) for calculating the simple moving average indicators. Then, a performance evaluation through statistical indicators has been developed for setting the values of the two thresholds (LT and UT) to be adopted in the second step of the procedure.

The possible trends of Δui have been evaluated over time periods (n) of 1, 2, 3, 4, 5, 6, 7, and 14 days. Only relatively short time periods have been considered in the analyses because pore water pressure variations leading to possible landslide initiation in shallow loose sediments are typically recorded few days before a landslide event. Besides, a short-term moving average allows identifying possible significant trends without any significant temporal lag between the original and the average data series caused by longer window lengths. Table 4 shows the comparison between the moving average trends considering the 69 warning events during which landslides occurred and the 388 warning events that are not associated with known landslides. It is worth highlighting that in the first case, an “uptrend” can be considered as a correct indicator (i.e., a trend suggesting that an increase of the warning level may be appropriate), whereas in the second case, a “downtrend” is a sign of coherence with the data (i.e., a trend suggesting that a decrease of the warning level or a withdraw of the warning may be appropriate). In both cases, “no trend” indicates that the moving average differences do not show a clear trend. Regarding “severe” (WL3) and “moderate” (WL2) warning events associated to landslides, a number of moving average differences (i.e., Δu4, Δu5, Δu6, Δu7, and Δu14 for WL3; Δu3, Δu4, Δu5, Δu6, and Δu14 for WL2) provide similar results, with a percent difference between correct and incorrect indicators greater than 50%. Conversely, considering both “severe” (WL3) and “moderate” (WL2) warning events not associated to landslides, the best performance is clearly obtained using Δu14, as this time period produces the maximum number of downtrends (41 for WL3 and 206 for WL2) as well as the minimum number of uptrends (17 for WL3 and 80 for WL2). “Very severe” warning events (WL4) cannot be evaluated in this case because only two events were issued in the considered time frame, both of them associated to the occurrence of a landslide.

Table 4 Number of uptrends (up), downtrends (down), and no trends (nt) for SMA differences of various time periods considering warning events that resulted and did not result in landslides. The number of warning events issued for each warning level is reported in round brackets

The trends of each moving average difference have been grouped according to the correctness of the indication provided, regardless of the warning level issued by the territorial model. On this issue, correct instances are represented by uptrends associated to warning events issued when landslides occurred, and downtrends associated with warning events issued when landslides did not occur. Conversely, incorrect instances are represented by uptrends associated with warning events issued when landslides did not occur and downtrends associated with warning events issued when landslides occurred. “No trend” instances refer, once again, to moving average differences that do not show a clear trend. Figure 6 highlights that the percent difference between correct and incorrect instances increases with the moving average time period. Indeed, Δu14 shows the highest value of this difference (42%), as 301 instances out of 457 are correct (66%), 110 are incorrect (24%), and the remaining 46 do not show any trend (10%). It should be pointed out that the performed pore water pressure trend analysis is conducted in addition to the evaluation of the territorial warning model, i.e., the Norwegian national LEWS, already defined combining hydrometeorological indexes likely to trigger landslides rather than using, as most of the territorial LEWS, only rainfall thresholds (Boje et al. 2014; Krogli et al. 2018). This suggests that the trigger of shallow landslides in Norway cannot be adequately predicted by short-term rainfall combinations alone. In this context, the information provided by the ATeWM (i.e., simple moving averages of recorded pore water pressures with a time period of 14 days) is precious and can be considered as representative of important hydrological pre-conditions increasing (or decreasing), at any given day, the average susceptibility to landslides within the warning zone.

Fig. 6
figure 6

Correctness of the indication provided by the SMA differences calculated using time periods (n) of 1, 2, 3, 4, 5, 6, 7, and 14 days

Successively, a performance evaluation has been conducted using Δu14 to identify the two thresholds to be employed in the second step of the procedure. In particular, six combinations have been compared (Table 5), considering three values of the lower threshold, LT (i.e., 5%, 10%, and 15%) and two values of the upper threshold, UT (i.e., 25% and 30%). Table 6 shows the results obtained for the six combinations and the territorial warning model considering the elements of each correlation matrix in terms of alert classification and grade of accuracy. Firstly, the analysis reveals that the model is extremely sensitive to the variations of LT, resulting in large differences among the three pairs of combinations Δu14,5,25*–Δu14,5,30*, Δu14,10,25*Δu14,10,30*, and Δu14,15,25*–Δu14,15,30*. This can be explained considering that the large majority of the level transitions in the period of analysis are 1-level transitions caused by the exceedance of LT. Higher values of TN and Gre and lower values of FA and Yel were obtained considering the pair of combinations Δu14,5,25*–Δu14,5,30*. This behavior is due to the high number of relocations from WL2 to WL1 in the second row of the matrix. However, a low value of the LT resulted in a high number of MA (19 for Δu14,5,25* and 18 for Δu14,5,30*) and Red errors (85 and 84 respectively). On the other hand, a high value of the LT (15%) implied a lower number of positive level transitions from the territorial model when compared to the other two thresholds combinations (especially in terms of TN, FA, and Yel). For these reasons, the best-performing pair is Δu14,10,25*–Δu14,10,30*, which allows to significantly increase the number of TN and Gre, both minimizing the number of MA (15 for Δu14,10,25* and 14 for Δu14,10,30* against 13 of the territorial model) and significantly reducing the number of FA, Yel, and Red.

Table 5 Combinations considered to identify the optimal values of LT and UT
Table 6 Number of correlation matrix elements considering the “alert classification” (a) and “grade of accuracy” (b) criteria

To confirm this finding, Table 7 and Fig. 7a, b show the results in terms of performance indicators for the six different threshold combinations and for the territorial warning model. Table 7 also reports the relative variations of the performance indicators from the territorial model for each threshold combination (\( \hat{x} \)), calculated as follows:

$$ \hat{x}=\frac{x_i-{x}_0}{x_0} $$
(4)

where xi is the performance indicator associated to a threshold combination, and xo is the same performance indicator associated to the territorial model.

Table 7 Performance indicators computed for the TeWM and for six thresholds combinations of the ATeWM. Relative variations from the TeWM target value are reported in round brackets
Fig. 7
figure 7

Bar charts showing the values of error (a) and success (b) indicators for the TeWM and for each combination of thresholds adopted in the ATeWM

Among the error indicators, the error rates (ERa and ERb) are very low (about 1% in both cases) for all the thresholds combinations as well as for the territorial warning model, due to the high numbers of TN and Gre. The overall smallest values are associated to the augmented territorial warning model with LT of 5% and 10% (more than 30% less than TeWM in both cases). The missed alert rate (RMA) and the false alert rate (RFA) are function of, respectively, days of MA and FA. Regarding the RMA, it is worth mentioning that the augmented territorial warning model cannot decrease, for how it is defined, the number of MA recorded in the territorial model. The lowest increase of RMA occurs for the threshold combinations employing LT of 10% and 15%, with values similar to that of the territorial model. The RFA in the augmented territorial warning model is generally lower than in the territorial model and the lowest values are recorded employing LT of 5% and 10%. The lower threshold is affected by a non-negligible probability of having severe model errors, (ISMA&FA is more than 30% in both the cases), whereas adopting 10% as LT of 10%, the values are quite similar to that of the territorial model (17.63% and 17.14%, respectively).

Regarding the success indicators, the Hanssen and Kuipers (1965) skill score (HK) shows a moderate but general decrease when the augmented territorial warning model is applied. This can be explained considering that this indicator is strongly influenced by the number of CA. However, the Matthews correlation coefficient (MCC) highlights an overall good performance of the augmented territorial warning model, especially for the pairs of combinations Δu14,10,25*–Δu14,10,30* (positive relative variations higher than 16% respect to the territorial model). MCC can be considered a more balanced indicator for our analyses, as it is not influenced by the different orders of magnitude existing between the number of TN and the other elements of the correlation matrix.

Similarly, the Euclidean distance from the perfect ROC classification in the ROC analysis (δROC) is lower in the territorial model (once again, as a consequence of the number of CA), whereas the Euclidean distance from the perfect classification in the Precision-Recall analysis (δP-R) is lower for the augmented territorial warning model regardless of the adopted thresholds. Based on this, Δu14,10,25*–Δu14,10,30* are herein considered the best performing pair of combinations.

Augmented territorial warning model validation

The results of ATeWM have been tested with a validation dataset (i.e., 238 territorial warning events) in order to adopt the optimal combination of thresholds. Table 8 shows the results obtained for the six combinations and the TeWM considering the elements of each correlation matrix in terms of alert classification and grade of accuracy. The combination Δu14,5,25*–Δu14,5,30* exhibits the best performance for 4 indicators out of 7 (i.e., TN, FA, Gre, and Yel). However, the remaining three indicators (i.e., CA, MA, and Red) produce values far lower than those of the territorial warning model. An increase of the LT to 15% resulted, in terms of days, in an increase of FA, Yel, and Red with respect to the two other pairs of threshold combinations. Therefore, once again, the best-performing pair of threshold combinations is Δu14,10,25*–Δu14,10,30*. Both combinations indeed produce a significant reduction of severe model errors (Red), and they do not increase the number of MA of the territorial model.

Table 8 Number of correlation matrix elements in terms of “alert classification” (a) and “grade of accuracy” (b) criteria

For consistency with the calibration analysis, the results are also reported in terms of performance indicators for the six different threshold combinations and the territorial warning model (Table 9). Looking at the error rates (ERa and ERb), the application of the augmented territorial warning model results in a general decrease of the errors, regardless of the thresholds adopted (ERa and ERb become lower than 1% adopting a LT of 5% or 10%). Regarding the RMA, it could be considered satisfactory that two pairs of combinations (i.e., Δu14,10,25*–Δu14,10,30*, and Δu14,15,25*Δu14,15,30*) show the same value in the model. The RFA and the ISMA&FA follow a general trend similar to the calibration phase: the first indicator is lower than the territorial model for all six combinations, and the best values are recorded employing a LT of 5%, even though the differences are not substantial when LT is set to 10%. Moreover, Δu14,10,25*–Δu14,10,30* allow minimizing the number of severe model errors, as the ISMA&FA do not show relevant relative variations in respect to the territorial model (4.33% and 1.47%, respectively). Concerning the success indicators, a positive effect can be observed on the HK by adopting a LT of 10% and 15%, as the MA do not show any increase respect to the territorial model and the FA decrease. The MCC follows a trend similar to the calibration phase, confirming that the pair Δu14,10,25*–Δu14,10,30* represent a good compromise for reducing the FA without increasing the number of MA.

Table 9 Performance indicators computed for the TeWM and for six thresholds combinations of the ATeWM. Relative variations from the TeWM target value are reported in round brackets

Figure 8 a reveals that δROC exhibits almost the same values, apart from the pair Δu14,5,25*–Δu14,5,30* because of the increase of the MA. On the other hand, the application of the augmented territorial warning model results in a reduction of the δP-R regardless of the adopted thresholds (Fig. 8b). Final comparisons are also shown for the type of errors within the FA (distinguishing between Red and Yel in Fig. 8c) and on the grade of correctness of the CA (distinguishing between Yel and Gre in Fig. 8d). Although a LT set to 5% allows minimizing of the number of FA, the severe errors remain almost the same, reducing marginally from 43 to 40. On the contrary, a significant reduction can be observed raising the LT to 10%, as the Red errors are reduced to 23. Moreover, looking at the CA, a LT of 10% leads to a good number of transitions from WL2 to WL3 and WL4 (14), without missing any landslides. From this, it is possible to conclude that the performed validation analyses confirm that the pair Δu14,10,25*–Δu14,10,30* are the best performing threshold combinations.

Fig. 8
figure 8

Performance analysis of the TeWM and the six thresholds combinations of the ATeWM considering: (a) ROC classification, (b) Precision-Recall classification, (c) severity of errors of FA, and (d) grade of correctness of CA

Influence of numerosity and location of piezometric data

The influence of the piezometers network configurations (i.e., numerosity of the piezometers and average distance from the landslides) on the reliability of the predictions has been evaluated in order to assess their correlation with the accuracy of the results produced by the ATeWM.

A first parametric analysis has been conducted to assess the influence of the numerosity of the piezometric locations on the accuracy of the results. The number of piezometers available for each landslide that occurred in the period of analysis varies between 2 and 7. They have been grouped in four numerosity classes (2, 3, 4, and 5 or more piezometers) so that a reasonable number of data is available in each class for supporting the analyses. The moving average trends have been evaluated in order to identify the uptrends (correct indicators) associated with the landslides for each class, following the same procedure adopted for the parametric analysis carried out in “Application of the territorial warning model” section aimed at identifying the most appropriate time period (n) for calculating the simple moving average indicators. Figure 9 a highlights that most of the landslides are related to uptrends, regardless to the numerosity of piezometric data. Yet, the correctness of the predictions increases passing from 2 (66%) to 5 or more piezometers (88%). Moreover, the upper class is characterized by the highest percent difference between correct and incorrect indicators (82%), as only in 2 cases out of 17 there is not coherence between landslides and piezometric data. This seems to suggest a positive correlation between the reliability of the predictions and the numerosity of piezometric data.

Fig. 9
figure 9

Influence on the correctness of the results of the number of piezometers (a) and of the average distance between landslides and networks of piezometers (b). The total numbers of landslides belonging to each numerosity and average distance class are also reported

Another parametric analysis has been carried out to assess whether and how the reliability of the results depends on the average distance between the networks of piezometers and the landslides. Indeed, the significance of piezometric data collected at a certain distance from the landslide is a crucial issue to investigate before using them for early warning purposes. To this end, the networks of piezometers have been grouped in four rather homogenous classes depending on their average distance from the landslides. It should be pointed out that “< 500 m” has been considered as the lower class in order to minimize the effects of the spatial uncertainties affecting the records contained in the landslide database (see “Data” section for further details on this issue). Following the same criterion already adopted for the two other parametric analyses, the moving average trends have been evaluated in order to assess the number of correct indicators (i.e., uptrends) for each class (Fig. 9b). As expected, the networks of piezometers installed in the proximity of the landslides (i.e., at a distance shorter than 500 m) provide the best results (83%), and only in 3 cases the occurrence of a landslide is not associated with an uptrend. However, the percent differences between correct and incorrect indicators observed for the four defined classes do not show significant variations, especially for average distances longer than 500 m. Indeed, the performed analyses revealed that the remaining piezometer classes provide correct indications in the majority of the cases: 78% (“500–1000 m” class), 73% (“1000–2000 m” class), and 78% (“> 2000 m” class).

These results suggest that the piezometric data available in each territorial unit can be usefully adopted in the ATeWM, regardless of their distance from the landslides. These aspects are crucial for evaluating the applicability of this method to different territorial units.

Conclusions

Landslide early warning systems are used to assess the probability of occurrence of weather-induced landslides over large areas through the prediction and monitoring of meteorological variables. Weather measurements alone, however, are often insufficient to produce reliable predictions of landslide occurrences in complex geomorphological environments. In such circumstances, locally collected geotechnical data can provide additional information to allow warning levels to be dynamically adjusted before they are issued by systems managers. This study presented an innovative augmented territorial warning model aimed at integrating pore water pressure measurements in a territorial warning model for weather-induced landslides in Norway.

In the case study, the ATeWM has been tested and applied considering, as territorial units, the hydrological basins for which an ensemble of shallow local pore water pressure measures may be considered a proxy of significant weather-induced changes in the groundwater regime over significant parts of the basins. The model has been applied considering several types of landslides that are already under surveillance in the Norwegian national LEWS and for which pore water pressure measurements could provide indications. These landslides are generally triggered by a rise of pore water pressure due to rain infiltration or snow melting and usually occur in quaternary shallow soil layers overlying bedrock. The proposed methodology was successfully applied to 30 Norwegian hydrological basins in the period between January 2013 and June 2017, exploiting a landslide catalog containing 125 weather-induced landslides. Moreover, two parametric analyses carried out in order to evaluate the influence of the different network configurations of the piezometers (i.e., numerosity and average distance from the landslides) on the correctness of the predictions, clearly demonstrated the reliability of the pore water pressure trend analysis at catchment scale.

In the model, local pore water pressures have been processed and combined in an ensemble of measures within a procedure aimed at improving the performance of a territorial warning model that is already operational at national scale. Modeling the soil-atmosphere interaction for soil stability purposes at slope scale was outside the scope of this study. Therefore, these measures should be considered proxies of significant weather-induced changes in the groundwater regime at the scale of the territorial units used as warning zones. The specific results of this study are only applicable to the territorial warning model used for Norway, as different (possibly shorter) time periods may be adopted to optimize the performance of a model in other locations. Despite this, the proposed methodology could be applied to other case studies, upon considering adequate contextual differences and limitations. In general, the field of application of an augmented territorial warning model is limited by the field of application of the original territorial warning model. Of course, the predictions of the ATeWM are also related to landslide phenomena for which the geotechnical measures collected at local scale, and considered in the model, can provide indications. In addition, the size of the territorial units used as warning zones needs to be appropriately defined. Indeed, a number of local measurements need to be available in each adopted each territorial unit, as they are potentially useful for analyses at territorial scale only if the monitoring data are processed and taken together as an ensemble of measures.

In summary, the major advantages of the proposed methodology are the following:

  • The simplicity and speed of the forecasting allow an easy implementation into operational early warning systems.

  • The number of extra input data needed is limited, and consists of available local geotechnical monitoring data.

  • The calibration and validation procedures, based on standard statistical indicators, allow the periodic performance assessment of the ATeWM.