Introduction

Landslides are one of the world’s most widely occurring natural hazards, causing high number of human casualties and considerable economic damage (Nadim et al., 2006; Petley 2012; Froude and Petley 2018). Global landslide fatalities have been increasing following the increase in population density (Petley 2012), changes in seasonal rainstorm pattern and human activities (Froude and Petley 2018). Rainfall is likely to be one of the major triggers of landslide-induced fatalities in mountainous areas as it is the case in many tropical African countries including Rwanda (Monsieurs et al. 2018b, c). However, a bias in geographical distribution of studies on landslides induced by climate factors including but not limited to changes in seasonal rainfall was found with a major gap in Africa (Gariano and Guzzetti 2016). To reduce rainfall-induced landslide casualties, empirical and physically based dynamic models to forecast landslide hazards have been proposed and adopted to define rainfall induced-landslide early warning thresholds. These thresholds indicate the minimum rainfall, groundwater levels, soil moisture contents and other hydrological conditions potentially linked to landslide initiation at local, regional and global scales. The physical, process-based models aim to understand and describe the dynamic processes responsible for landslide initiation. They typically combine slope stability and hydrological models in which dynamic hydrological processes are used to evaluate slope failure probabilities (Anderson and Lloyd 1991; Montgomery and Dietrich 1994; van Beek 2002; Rosso et al. 2006; Kuriakose et al. 2009). However, physically based dynamic models require high-resolution spatio-temporal data, which are largely unavailable in most of the areas world-wide. Applications of this type of models are thus highly limited to few regions with sufficient data and typically to local scales only (Aleotti 2004).

Due to their less detailed data requirements, empirical-statistical models have been widely adopted to define the precipitation induced-landslide early warning thresholds at local (Crozier 1999; Prenner et al. 2018), national (Robbins 2016; Peruccacci et al. 2017), regional (Monsieurs et al. 2018a) and global scales (Caine 1980; Guzzetti et al. 2008). The empirical-statistical models typically relate precipitation characteristics, such as antecedent precipitation, cumulative event precipitation, precipitation intensity and precipitation duration or combination thereof to the occurrence of landslides. Despite the considerably lower data requirements of empirical-statistical threshold models, landslide initiation thresholds still remain poorly explored and defined throughout Africa (Gariano and Guzzetti 2016). This is due to the lack of accurate and complete landslide inventories and insufficient spatio-temporal resolution of the available precipitation data (Monsieurs et al. 2018c, b). The recent efforts by LIWEAR (Landslide Inventory for the central section of the Western branch of the East African Rift) as part of NASA’s GLC (Global Landslide Catalogue) to systematically document landslides may partially solve the landslide data scarcity in Africa (Monsieurs et al. 2018c). Despite that, however, many landslide events are likely to be missed in the inventory due to the fact that for Africa mostly only newspapers, government reports and other media are used as a source for landslide inventory. While the reliance on these data sources is likely to result in a bias towards large and/or impactful landslides that may involve casualties and economic damage, this landslide inventory can nevertheless serve as basic starting point to define rainfall thresholds for landslide initiation in African countries.

Despite the limited research devoted to landslides in Africa, a number of landslide initiation rainfall thresholds have been proposed in the past (Piller 2016; Monsieurs et al. 2018a, 2019). Those thresholds are mainly inferred from empirical methods which are based on statistical analysis of historical rainfall characteristics and landslides inventories to distinguish the landslide conditions from no-landslide conditions. However, many limitations, constraints and uncertainties associated with empirical thresholds have been highlighted (Peres et al. 2017; Prenner et al. 2018; Bogaard and Greco 2018). Some limitations are due to the fact that empirical thresholds are mainly based on the rainfall event during which a landslide occurred, which is in reality the actual landslide trigger. Hereafter, these thresholds are therefore referred to as landslide trigger-based thresholds with various timescales depending on rainfall intensity, volume or event duration. Such landslide trigger-based thresholds include the intensity-duration (I-D) (Caine 1980; Guzzetti et al. 2007, 2008; Ma et al. 2015; Hong et al. 2017; Roccati et al. 2018), event-duration (E-D) and event-intensity (E-I) (Peruccacci et al. 2017; Robbins 2016). The landslide trigger-based thresholds have been increasingly recognised to neglect the causal hydrological processes that predispose the slope to failure (Peres et al. 2017; Bogaard and Greco 2018; Mostbauer et al. 2018). To include this, a number of researchers considered the possible hydrological causes in terms of antecedent precipitation, catchment storage, soil moisture indices and or soil water status prior to the landslide triggering event or storm (Crozier 1999; Glade 2000; Aleotti 2004; Ciavolella et al. 2016; Mostbauer et al. 2018). These temporally variable hydrological conditions define the hydrological predisposition of a region to landslide occurrence and are thus, besides its geomorphological predisposition the root cause of landslide occurrence in a region. These hydrological conditions are defined prior to the landslide triggering conditions and then combined to make landslide trigger-cause-based thresholds.

The concept of the landslide trigger-cause framework was recently proposed by Bogaard and Greco (2018) and has been adopted using either in situ observed or modelled soil moisture to define the landslide hydro-meteorological thresholds (e.g. Mirus et al. 2018a, b; Prenner et al. 2018). Similar concepts were also adopted in other studies (e.g. Crozier 1999; Glade 2000; Aleotti 2004; Ciavolella et al. 2016; Mostbauer et al. 2018; Prenner et al. 2019). Mirus et al. (2018b) used the prior in situ soil saturation as a cause and the recent cumulated rainfall as trigger to define bilinear thresholds for landslide warning in Portland. The concept of bilinear thresholds was proposed by Mirus et al. (2018a) based on its predictive capacity as compared with the more commonly used univariate linear thresholds. One of the constraints of the hydro-meteorological/cause-trigger concept is indeed that one has to explore a wide range of combinations of explanatory variables which may be different based on landslide pre-disposing and triggering factors. The objective of this research is to use landslide and precipitation data in an empirical-statistical approach to define cause-trigger-based thresholds for landslides in Rwanda. Specifically, in this paper we aim to:

  1. i.

    identify precipitation-related variables with the highest explanatory power for landslide occurrence in Rwanda

  2. ii.

    quantify both, landslide trigger-based and cause-trigger-based thresholds as first step towards robust landslide early warning systems in Rwanda

Study area description

Rwanda is a landlocked country geographically located between 1–3° S and 28–31° E in Central-Eastern Africa, characterised by tropical climate and pronounced relief. It is bounded by Uganda on the North, Burundi in the South, Tanzania on the East and Democratic Republic of Congo on the West (Fig. 1). It is topographically dominated by the volcanic highlands located over the North and Western regions and lowlands in the Savannah region located in east and southeast of the country as shown in Fig. 2. The highland region receives abundant rainfall with a long-term mean > 1200 mm year−1 while this reduces to < 1000 mm year−1 in the savannah region (Fig.1). The country has two rainy seasons, the longer one extending from March through mid-May and the shorter one from mid-September to mid-December. Among the factors known to influence rainfall in Rwanda are subtropical anticyclones, tropical cyclones, monsoons, El Niño Southern Oscillation (ENSO), large water bodies and topography (Ngarukiyimana et al. 2017). Those factors expose Rwanda to variable weather, associated with frequent extreme rainfall events and the prolonged wet season that lead to flooding and landslide hazards.

Fig. 1
figure 1

Location of the study area, with spatial distribution of mean annual rainfall and recorded landslides from 2006 to 2018 (red dots). In the side panel, mean monthly rainfall per regions is shown for the 2006–2018 period

Fig. 2
figure 2

Elevation map of the study area, isohyets (mm) and recorded landslides from 2006 to 2018 (red dots)

The geology of Rwanda consists of Precambrian metasedimentary rocks mainly quartzite, sandstones and shales intruded by granites. Granitic-gneisses and migmatites are dominants in eastern Rwanda while Neogene and Quaternary volcanic deposits are dominants in northwest and southwest. The western Rift consists of alluvium and lake sediments of Quaternary age. The main lithological units in landslide area include mica schists and pegmatite rocks (Fig. 3) which are unstable due to rapid weathering, easy splitting along the joints and bedding planes and loss of strength induced by the high content of mica. Despite limited research on the impact of tectonic and seismic movement on landslides in Rwanda, the country is located in a tectonic region whose epicentre is located in Kivu Lake bordering the western part of Rwanda and Democratic Republic of Congo. The northwest part of the country is occupied by a volcanic chain which is seismically active. This makes Rwanda, especially the western and northwest regions, susceptible to earthquakes mentioned as one of the landslide causes in the East African Rift by Monsieurs et al. (2018c).

Fig. 3
figure 3

Geology and lithological units of Rwanda and landslides (red dots) distribution

In part of Mukungwa catchment (756 km2) located in northwestern part of Rwanda, active and inactive landslides were recorded. The recorded landslides were classified as rotational slide (34%), flow (26%), translational slide (17%), fall (15%) and complex type of mass movement (7%) and involving mainly debris and earth materials. The typical landslide areal extent varied from 2.8 × 101 to 4.4 × 105 m2 with an intensity of failure volume estimated between 1.3 × 101 and 5.8 × 106 m3 associated with a total landslide mobilization rate of about 21 mm year−1. The main type of soils involved in mass movement is largely Umbric and Haplic acrisols (Dewitte et al. 2013) characterised by high clay content in deep layers acting as an impermeable layer and thus, creating perched ground water and high pore water pressure in the overlaying soil layers. Rwanda has undergone significant land use changes over the last decades (Nambajimana et al. 2020) which may have also induced change in landslide intensity. The Landsat-8 and sentinel images (1990–2016), accessed from the regional centre for mapping resources and development (http://opendata.rcmrd.org/search?q=RWANDA%20LANDSAT%20), indicated a forest land decrease from about 43 to 14% while agricultural land raised from about 25 to 53%.

Landslide data

The available data includes the Rwanda landslide inventory partially provided by the LIWEAR project and rainfall time series provided by Rwanda meteorological agency.

Landslides inventory

Part of the landslide inventory for Rwanda was accessed from the NASA global landslide catalogue (https://data.nasa.gov/Earth-Science/Global-Landslide-Catalog/h9d8-neg4) uploaded mainly by the LIWEAR project. The catalogue was extended through compilation of other rainfall-induced landslides as reported from local newspapers, blogs, technical reports and field observations. For the catalogue extension, we followed the global landslide inventory methods using standard indices adopted by Kirschbaum et al. (2012), Bach et al. (2010) and Monsieurs et al. (2018c). Seven elements were recorded for each landslide: (i) Landslide location (e.g. Village, cell, sector, district or town); (ii) time of occurrence (date); (iii) triggering event (e.g. rainfall); (vi) landslide type based on Hungr et al. (2014) classification and depending on the availability of background information; (v) latitude and longitude with relative locational accuracy; (vi) information about the impact (number of fatalities, injuries and damages); (vii) the accessible source of information was also mentioned with links to online source of information. Only hazardous (fatal and highly damaging) landslides are mostly reported while non-hazardous ones are likely to be missed. Based on the inventory, about 99% of landslides occurred from 2006 while the remaining occurred far before 2006. Therefore, 2006 was taken as the threshold year and landslides that occurred between 2006 and 2018 were used for this study.

Rainfall and representative rain gauges

We used daily rainfall time series recorded from 35 rain gauges in Rwanda over a period of 13 years from 2006 to 2018. The rainfall dataset was accessed from Rwanda Meteorology Agency. Among the 35 rain gauges, representative rain gauges were selected to identify the rainfall conditions for each or multiple landslide. The representative rain gauges were selected based on their weights (W) estimated based on the cumulated rainfall event (E) until the landslide day, the distance between rain gauge and landslide (d) and duration D (days) firstly proposed by Melillo et al. (2018) using Eq. (1).

$$ W=\frac{E^2}{d^2D} $$
(1)

The number of rain gauges to be weighted for each landslide was chosen based on their location inside the buffer radius around the landslide location. The higher the weight, the higher the chance for the rain gauge to represent the rainfall conditions responsible for the landslide. Based on the highest weights (W) proposed by Melillo et al. (2018), 22 rain gauges out of 35 were found to be representative for the rainfall conditions responsible for the landslide occurrence. A single dataset of rainfall conditions from 22 rain gauges was made to pinpoint the landslide triggering conditions from non-triggering conditions.

Methodology

Definition of landslide rainfall conditions

The landslide conditions were divided into 4 categories based on their timescale. The first category considers the entire rainfall event during which one or more landslides occurred and is referred as the maximum probable rainfall event (MPRE). The second, third and fourth categories respectively consider the accumulation of very recent rainfall over the last 3 days (RD3), 2 days (RD2) and 1 day (RD1) with the last day coinciding with the day of landslide occurrence. The RD3, RD2 and RD1 for each day during the 2006–2018 study period were calculated, irrespective of a landslide occurring or not. MPRE was here defined as individual periods of days with recorded rain ≥ 1 mm day−1 interrupted by dry periods of at least two dry days. The rainfall event E (mm/E) was then computed as the accumulated rainfall during each MPRE which is equivalent to the event duration D (day). The event intensity (mm day−1) was then computed as a ratio of E and D. Landslide causal conditions were represented by the Antecedent Precipitation Index (API) considered as a proxy for soil moisture accumulation. The API was calculated as a cumulative rainfall occurring over a predefined time periods prior to the landslide triggering conditions. For this study, time periods of T = 30, 10 and 5 days were considered to define the API30, API10 and API5, respectively. A decay coefficient k = 0.95 was used to estimate APIT(t) for each day t over the study period according to the Eq. (2).

$$ \mathrm{AP}{\mathrm{I}}_{\mathrm{T}}(t)=R(t)+ kR\left(t-1\right)+{k}^2R\left(t-2\right)+{k}^3R\left(t-3\right)\cdotp \cdotp \cdots R\left(t-T\right) $$
(2)

where R is the daily rainfall (mm day−1), k is the decay coefficient (−), t is the individual day and T is the antecedent accumulation period (day) (30, 10 and 5 days) prior to the starting day of the rainfall triggering conditions (MPRE, RD3, RD2 and RD1).

Quantification of landslide explanatory precipitation variables

The landslide explanatory precipitation variables which include the landslide causal (pre-disposing) and triggering conditions were explored using receiver operating characteristic (ROC) curves (Hong et al. 2017; Postance and Hillier 2017; Mirus et al. 2018a; Prenner et al. 2018). The ROC is a graphical representation created by plotting the false positive rate (FPR) of wrongly predicted landslides against the true positive rate (TPR) of correctly predicted landslides. The ROC curves are made of a suite of possible threshold levels at which a balance between each threshold’s true positive rate and the corresponding false positive rate is evaluated. The area under the ROC curve (AUC) is used as an indicator of the variable performance, where a perfect test variable would result in an AUC = 1. The AUC indicates the capacity of the considered test variable to correctly distinguish landslide from no-landslide conditions. Thus, the AUC was used on the one hand as a statistical metric to compare the tested precipitation variables against random guessing, i.e. AUC = 0.5. On the other hand, it was used to find precipitation-related variable with the highest explanatory power for landslide. The true positive rate (TPR) associated with each threshold level on ROC curves is calculated with Eq. (3).

$$ \mathrm{TPR}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}} $$
(3)

The false positive rate (FPR) is calculated by Eq. (4).

$$ \mathrm{FPR}=\frac{\mathrm{FP}}{\mathrm{FP}+\mathrm{TN}} $$
(4)

where TP are true positives, i.e. the number of landslides correctly predicted by the threshold; FN are false negatives, and thus the number of landslides that occurred in reality but that were not predicted, i.e. the number of landslide triggered by rainfall conditions below the defined threshold. FP are false positives, i.e. incorrect predictions of landslide occurrence by the threshold model while in reality, there was no landslide reported. TN are true negatives, i.e. are correct predictions of no landslide occurring.

Threshold definition techniques

Since the AUC only indicate which precipitation variable or combination of variables that can significantly distinguish landslide from no landslide and the ROC curves indicating all possible thresholds and their respective balance of TPR and FPR, it is also necessary to define the optimum threshold levels above which landslide are high likely to occur. We used 3 different techniques to do that: Bayesian probabilistic approach (Prob), maximum true skill statistic (TSS) and minimum radial distance (Rad). The Bayes’ theorem defines the conditional probability of an event A (here: landslide occurrence) given an event B, here represented by different precipitation variables as introduced in the “Definition of landslide rainfall conditions” section. To reduce the high rainfall data scattering, specific magnitude-frequency distributions for each rainfall variable were defined using bins. Based on the extent of the dataset, bins of 5 mm were used for E, RD and API while 2 mm day−1 and 2 days were used for event intensity and event duration respectively. The specific magnitude-frequency pairs were then converted into probabilities based on Bayes’ terminologies. The Bayes’ prior probability of an event A, P(A) stands for the global probability of landslide to occur regardless of the event B. If \( {N}_{A_T} \)denotes the total number of landslide conditions (≈ total number of landslides) and \( {N}_{B_T} \)the total number of rainfall events (landslide + no landslide conditions) recorded over the predefined period (here 2006–2018), P(A) is calculated with Eq. (5).

$$ \mathrm{P}\left(\mathrm{A}\right)=\frac{{\mathrm{N}}_{{\mathrm{A}}_{\mathrm{T}}}}{{\mathrm{N}}_{{\mathrm{B}}_{\mathrm{T}}}} $$
(5)

If we define also \( {N}_{B_S} \)as the number of events B with specific magnitude (e.g. 20 mm ≤ E < 25 mm), the prior probability for an event B denoted as P(B) is thus expressed with Eq. (6) and indicates the probability to have an event B regardless of whether landslide occurs or not.

$$ \mathrm{P}\left(\mathrm{B}\right)=\frac{{\mathrm{N}}_{{\mathrm{B}}_{\mathrm{S}}}}{{\mathrm{N}}_{{\mathrm{B}}_{\mathrm{T}}}} $$
(6)

The conditional probability P(A|B) expressed in Eq. (7) indicates the probability for landslide occurrence given the specific magnitude of rainfall variable.

$$ P\left(A|B\right)=\frac{P\left(B|A\right)\cdotp P(A)}{P(B)} $$
(7)

The conditional probability P(B|A) is the probability to have rainfall of a specific magnitude BS given that landslide occurs and is calculated by Eq. (8) or (9).

$$ P\left(B|A\right)=\frac{P(B)\cdotp P\left(A|B\right)}{P(A)} $$
(8)
$$ P\left(B|A\right)=\frac{{\mathrm{N}}_{{\mathrm{A}}_{\mathrm{S}}}}{{\mathrm{N}}_{{\mathrm{A}}_{\mathrm{T}}}} $$
(9)

With \( {N}_{A_S} \) denoting the number of landslides that occur within a specific rainfall magnitude BS (e.g. number of landslide that occurs when rainfall intensity was between 8 and 10 mm day−1 8 ≤ I < 10), the probabilistic threshold values are defined by comparing the prior P(A) to the posterior probability P(A|B) (Berti et al. 2012; Robbins 2016; Peres et al. 2017). If the posterior landslide probabilities P(A|B) differ from the prior landslide probability P(A), the rainfall variable (B) has a significant effect on landslide occurrence (A). Contrary, when P(A|B) is objectively smaller or equal to P(A), there is no significant effect of variable B. The probabilistic threshold value for a variable B to initiate landslide (A) is objectively taken as the specific magnitude or level at which the posterior probability distribution curve P(A|B) goes beyond the prior probability distribution curve of P(A) as shown on Fig. 4. The cumulative probability curve cumP(B|A) indicates the probability to have landslides below and above the threshold level of B, respectively. This is equivalent to the false negative (FNR) and true positive rates (TPR) on the ROC curve as expressed by Eqs. (10)–(12).

$$ \mathrm{FNR}= cumP\left(B|A\right) $$
(10)

or

$$ \mathrm{FNR}=\frac{\mathrm{FN}}{\mathrm{FN}+\mathrm{TP}} $$
(11)
$$ \mathrm{TPR}=1- cumP\left(B|A\right) $$
(12)
Fig. 4
figure 4

Probabilistic threshold definition: on X axis is the magnitude of event intensity I; on Y primary axis: the red constant curve is the prior landslide probability P(A); the black axis represents the conditional probability to have landslide given a specific magnitude of I (P(A|B)). On secondary Y axis: the light green axis is the cumulative probability to have an event intensity I of specific magnitude given that landslide occurs cumP(B|A); the blue axis represents the cumulative probability of I regardless of landslide occurrence or not (cumP(B)). The dark green sphere and 2 vertical lines indicate the specific magnitude or level at which P(A|B) goes beyond P(A) and this represents the probabilistic threshold intensity which is between 8 and 10 mm day−1. The dark green horizontal lines indicate the resulting false negative rate equivalent to the Cum P(B|A) represented by the light green curve

The threshold definition based on the maximum true skill statistics TSS (e.g. Ciavolella et al. 2016; Peres et al. 2017) and the minimum radial distance Rad (Postance and Hillier 2017; Mirus et al. 2018a) has been particularly used in landslide studies. The true skill statistics is expressed as a balance between the true positive rate and false positive rate as indicated on Eq. (13) and its maximum value indicates the optimum threshold. For a perfect threshold, the TSS would be a unity, i.e. with zero false positive rate. On the ROC curve, the radial distance (Eq. (14)) indicates the relative distance from the defined threshold to the optimum point whose TPR is a unit and FPR is zero. Thus, the minimum radial distance would be zero for a perfect threshold (Postance and Hillier 2017).

$$ \mathrm{TSS}=\mathrm{TPR}-\mathrm{FPR} $$
(13)
$$ \mathrm{Rad}=\sqrt{\mathrm{FP}{\mathrm{R}}^2+{\left(\mathrm{TPR}-1\right)}^2} $$
(14)

Cause-trigger-based thresholds definition and implication for landslide prediction

The cause-trigger-based thresholds were defined by combining the best performing thresholds selected from one of the techniques described in the “Threshold definition techniques” section. According to Postance and Hillier (2017), the ideal landslide warning threshold is the one leading to the maximum positive alarms (TP), minimum failed alarms (FN) and also with minimum number of false alarms (FP). Based on these criteria, the most realistic threshold was selected among the ones defined either by Bayesian probabilistic approach, maximum true skill statistics or minimum radial distance. These thresholds were plotted on both axis of landslide triggering and causal variables in Y, X pairs as I-API30, I-API10, I-API5, E-API30, E-API10, E-API5, RD1-API30, RD1-API10, RD1-API5, RD2-API30, RD2-API10, RD2-API5, RD3-API30, RD3-API10 and RD3-API5. To evaluate the performance of the newly adopted method, a confusion matrix for each pair was performed and the resulting rate of positive alarms, false alarms, failed alarms and true negatives was quantified.

Results and discussion

Landslide explanatory rainfall variables and thresholds

A total number of 9353 MPRE from 34,438 rainy days (RD) that include landslide and no landslide conditions were recorded in Rwanda from 2006 to 2018. From this MPRE and RD catalogue, 59 MPRE and 60 RD ( total number of landslides) were highlighted as conditions responsible for the occurrence of one or more landslides recorded in the inventory. The area under the curve (AUC) of each variable of the MPRE and RD in Fig. 5 indicated the probability of all test variables to correctly distinguish landslide from no-landslide conditions. The AUC was highest for entire rainfall events E and the cumulated 1-day rainfall RD1 as compared with other landslide triggering precipitation variables. This suggests that in the study region the cumulated rainfall received on the day of a landslide has more impact to trigger landslides than previously recorded rainfall. It also indicates that shorter timescale triggering conditions are more relevant for landslide occurrence than longer timescales. Even though, rainfall event volumes E have also scored higher at distinguishing landslide from no-landslide conditions, it is critical to note that E has variable timescales that should be normalised by the event duration D and thus ending up with event intensity I as the most informative test variable. The overall performance of antecedent precipitation indices (API), here considered as landslide cause, indicates that the cumulated rainfall over 10 days (API10) prior to the landslide triggering conditions has the most influential effect on landslide occurrence as compared with longer (30 days) and shorter (5 days) antecedent periods. On the one hand, this can be attributed to the hydro-geotechnical properties of soil-like hydraulic conductivity, permeability and soil texture that contribute to subsequent interplay between infiltration, evaporation and drainage and thus the drawdown of the longer antecedent precipitation (API30) period. On the other hand, this may indicate the lags in water flow to reach the critical layer of the regolith for shorter periods like API5. The ROC curves in Fig. 5 indicate the possible threshold levels for each tested variable and the respective balance of TPR and FPR. The optimum threshold levels above which landslide is high likely to occur are presented with different symbols on the curve depending on the technique used. The detailed information of the defined optimum thresholds is summarised in Table 1. The maximum true skill statistics (TSS) indicated that landslide is high likely to occur when the cumulated rainfall volume E goes beyond 29.9 mm/E and this threshold level resulted to about 93% of correct predictions of landslide, i.e. true positive alarms and about 41% of false alarms. A similar threshold was obtained based on Prob, indicating the highest probability for landslides to occur beyond 30–35 mm/E with a mean value of 32.5 mm/E. However, the minimum radial distance (Rad) approach revealed a higher threshold level of about 45.9 mm/E associated with quite lower positive alarms (76.3%) in favour of a lower rate of false alarms (26.2%). From TSS, Prob and Rad, the critical event duration was inferred to be around 4 days which would lead to the normalised event En thresholds of about 7.5 mm day−1, 8.1 mm day−1 and 11.5 mm day−1 respectively. These thresholds are similar to the defined event intensity thresholds of 7.9 mm day−1, between 8 and 10 mm day−1 and 10.1 mm day−1 respectively from TSS, Prob and Rad. Based on daily rainfall (RD) variables in Table 2, the optimum threshold levels above which landslide are high likely to occur were 12.5 mm day−1(Prob), 20.9 mm day−2 (Rad) and 27.0 mm day−3 (TSS and Rad) for RD1, RD2 and RD3 respectively. The optimum API threshold levels were also defined as indicated on Fig. 4 and Tables 1 and 2. The most informative thresholds were 45.5 mm (Prob), 23.6 mm (Rad) and 7.7 mm (Rad) for API 30, API10 and API5 respectively prior to the landslide triggering event (MPRE). The API thresholds for RD variables are also presented in Table 2. It has to be understood that the API thresholds indicate the levels below which no influence of antecedent precipitation would be expected to contribute to the landslide triggering conditions. However, it has to be noted that API thresholds are very sensitive to the timescale of the triggering conditions. Shorter timescale triggering conditions like RD1require higher threshold levels of API as compared with RD2 or MPRE. This shows that relying on trigger-based thresholds for landslide early warning could lead to biased results rather than relying more on API’s thresholds. Thus, shorter timescale triggering conditions should be preferred as confirmed based on AUC.

Fig. 5
figure 5

Receiver operating characteristic (ROC) curves for a MPRE variables, b RD1 variables, c RD2 variables and d RD3 variables; variable significance based on the area under the curve (AUC) and the optimum thresholds defined using Bayesian probabilistic approach (triangle shaped marker); maximum true skill statistics (rectangle-shaped marker) and the minimum radial distance (sphere-shaped marker). Once two or more techniques revealed similar threshold with similar TPR and FPR, only one symbol is used. The figure also indicates corresponding true positive rate (TPR) and false positive rate (FPR) for each threshold level

Table 1 MPRE landslide explanatory precipitation variables and their thresholds defined using probabilistic approach (Prob), maximum true skill statistics (TSS) and minimum radial distance (Rad)
Table 2 Cumulative recent rainfall variables (RD1, RD2, RD3), antecedent precipitation index (API) and landslide initiation thresholds defined using probabilistic (Prob), maximum true skill statistics (TSS) and the minimum radial distance (Rad) techniques

Landslide trigger and trigger-cause-based thresholds and implication for landslide prediction

The results of bilinear combinations of explanatory variables show that in some cases, a single variable threshold can be sufficient to predict landslides (Figs. 6 and 7), horizontal blue lines). Based on the maximum TSS threshold (Fig. 5, Tables 1 and 2), 91.5% of the landslides are correctly predicted once an event intensity threshold level of 7.9 mm day−1 is exceeded. Similarly, 93.5% of the landslides are highly likely to occur when the rainfall event E exceeds 30.0 mm while 71.7% of the landslides are highly likely to occur when daily rainfall (RD1) exceeds 12.5 mm day−1. These threshold levels are all trigger-based as they only refer to the recent rainfall/event during which one or more landslides occur. However, these trigger-based thresholds should be constrained by relatively high rates of false alarm (FPR) of about 52%, 41% and 30% for I, E and RD1 respectively. Moreover, it should be noted that many landslides occur not only due to the trigger itself but rather due to a combination of trigger and cause, the latter represented by API. For example in Fig. 6a, it can be seen that only 23.7% of the observed landslides, which is equivalent to about 26% of the correctly predicted landslides using event intensity (I) threshold, was due to the triggering event while the remaining 74% of the predicted landslides were due to the combined effect of both I and API. As pointed out in the “Landslide explanatory rainfall variables and thresholds” section, the API thresholds indicate the critical level below which the impact of antecedent precipitation is considered unimportant for landslide predictions. On the contrary, once the API threshold is exceeded, its contribution should be counted as one of the landslide causal factor and thus resulting into trigger-cause-based thresholds. The top left and right panels of Figs. 6 and 7 indicate the improved prediction capacity of the trigger-based threshold once combined with a cause-based threshold. For example Fig. 6 indicates that the prediction capacity (TPR) of the event intensity threshold increased by about 44%, 27% and 30% once combined with API as trigger-cause-based thresholds in a bilinear format as I-API30, I-API10 and I-API5 respectively. Figure 7 indicates also an improved prediction rate of RD1 trigger-based threshold by about 15%, 35% and 12% once combined as RD1-API30, RD1-API10 and RD1-API5 respectively. Therefore, the concept of bilinear thresholds (Mirus et al. 2018a) or trigger-cause-based thresholds does not only minimise the false alarm rates but can also be utilised to quantify the impact of each and both landslide triggering and causal condition to the landslide occurrence. Figures 6 and 7 show the resulting rate of true alarms (TPR), false alarms (FPR), failed alarms (FNR) and true negative rate (TNR) from different trigger-cause-based thresholds. This approach should further be explored to be utilised for API-based landslide early warning system development.

Fig. 6
figure 6

Bilinear relation between landslide trigger represented by event intensity I and cause represented by antecedent precipitation indices of different timescales. a Thirty days prior to the triggering event intensity. b Ten days prior to the triggering event intensity. c Five days prior to the triggering event intensity. d Implication for warning based on the rate of true warnings (TPR) represented by green triangles on a, b and c; rate of false alarms (FPR) represented by red cross on a, b and c; rate of failed alarms (FNR) represented by red dots on a, b and c; true negative rate (TNR) or no landslide represented by black cross on a, b and c

Fig. 7
figure 7

Bilinear relation between landslide trigger represented by 1-day rainfall (RD1) and cause represented by antecedent precipitation indices (API) of different timescales. a Thirty days prior to the triggering rainfall. b Ten days prior to the triggering rainfall. c Five days prior to the triggering rainfall. d Implication for warning based on the rate of true warnings (TPR) represented by green triangles on a, b and c; rate of false alarms (FPR) represented by red cross on a, b and c; rate of failed alarms (FNR) represented by red dots on a, b and c; true negative rate (TNR) or no landslide represented by black cross on a, b and c

Lastly, as pointed out in the introductory part of this article, the landslide inventory used for this research relied largely on the information from newspapers, government reports and other media where many landslide events are likely to be missed. While the reliance on these data sources is likely to result in a bias towards large and/or impactful landslides that may involve casualties and economic damage, this landslide inventory is the most comprehensive currently available in Rwanda.

Conclusion

This research aimed to use landslide and precipitation data in an empirical-statistical approach to define both trigger and trigger-cause-based thresholds for landslide initiation in Rwanda and to quantify their predictive performance. The findings of this study indicated that the normalised event E and the cumulative 1-day rainfall (RD1) that coincide with the landslide day are the most informative explanatory variables to distinguish landslide from no landslide conditions. Among the antecedent precipitation indices, 10-day rainfall prior to the landslide triggering conditions was the most informative to distinguish between landslide and no-landslide conditions based on its AUC. API5 was too short while API30 was too long. This underlines the critical role of hydrology (infiltration, storage, evaporation/drainage) and particularly the timing of pore pressure changes in the subsurface profile. It was also generally observed that all used threshold definition techniques, Bayesian probabilistic approach, maximum true skill statistic and minimum radial distance, resulted in quite similar threshold values. The highest landslide prediction capability (rate of positive alarms) was obtained using a single rainfall variable, so a trigger-based threshold. However, that predictive capability simultaneously resulted in a high rate of false alarms. Constraining the trigger-based threshold with a causal variable in a bi-linear framework as proposed by Mirus et al. 2018a improved the overall prediction capacity by reducing the number of false alarms. The findings indicated also that the concept of trigger-cause-based thresholds in bilinear format could not only be useful to minimise the false alarms but also to explore the impact of each or combined triggering and causal conditions on landslide occurrence.