1 Introduction

Shallow landslide, often referred to as shallow slope failure, is a type of landslide characterized by the rapid movement of a relatively thin layer of soil and rock material along a slope. In contrast to deep-seated landslides that involve the failure of deeper and more consolidated layers by gradual slope instability, shallow landslides primarily affect the near surface layers, mainly triggered by reduction in soil shear strength and increase in shear stress along the potential failure surface over relatively short periods (Montrasio and Valentino 2008; Popescu 1994). Given the limited depth of material, shallow landslides involve rapid downslope movement, with localized, but more immediate and concentrated damages compared to deep-seated landslides. Campbell (1975) and Bogaard and Greco (2016) reported that the increase in pore water pressure was one of the major factors controlling the triggering parameters due to increased soil moisture and groundwater level. This has led several studies to focus on variations in soil moisture and groundwater to address shallow landslide events (Hawke and McConchie 2011; Huang et al. 2021; Matsuura et al. 2008).

Precipitation, considered to be the major contributor to landslides, is used in many ways to investigate the effect on slope stability. Caine (1980) suggested using the rainfall intensity–duration curve for thresholding shallow landslides and debris flows. Glade et al. (2000) used the antecedent daily rainfall to identify the triggering conditions of landslides, and Sengupta et al. (2010) used cumulative event rainfall and duration for predicting landslides. Rosi et al. (2021) proposed a three-dimensional threshold to further consider antecedent rainfall over the intensity-duration threshold and found that it lowered the false alarms up to 86%. However, it should be noted that using only rainfall-based parameters alone to determine thresholds has some limitations. Bogaard and Greco (2016) discovered that predicting landslides using precipitation only was difficult because most slopes do not fail with extreme rainfall. Due to the regional heterogeneity, typical thresholds determined in prior studies have diverse appearances (Guzzetti et al. 2007; Saito et al. 2010). The most complicated problem is that these characteristics are site-specific, depending on the region’s circumstances and regime (Haga et al. 2005; Nicholson and Farrar 1994; Singh et al. 2021).

Numerous studies have tried to address this problem by using additional variables that are highly correlated with slope stability (Conrad et al. 2021; Huang et al. 2021; Kubota et al. 2017). In particular, number of studies have focused on the soil saturation level (Brocca et al. 2012; Marino et al. 2020; Ray et al. 2010). Ray and Jacobs (2007) attempted to demonstrate the correlations between relevant factors by comparing remotely sensed soil moisture and rainfall with landslide events. However, some landslide events occurred when soil moisture was lower than peak, while some events did not match the peak precipitation. Ziadat and Taimeh (2013) explored the impacts of rainfall intensities, antecedent soil moisture, and slope steepness on soil erosion. They found that although the contributions of the variables differ, antecedent soil moisture had a significant impact on soil sediment transport. Thomas et al. (2019) used antecedent soil saturation and daily rainfall to determine empirical and hypothetic landslide thresholds and found that both ground- and satellite-based approach detected the landslide’s initiation point. Wicki et al. (2020) investigated the utility of in-situ soil moisture for early warning of regional-scale landslides. Predisposing factors (e.g., antecedent saturation and 2-week preceding maximum saturation) and dynamic event conditions (e.g., saturation change and 3-h maximum infiltration rate) were both revealed to be statistically significant in warning landslide. Modelled, reanalysed soil moisture data were also successfully confirmed for their usefulness in Bezak et al. (2021) and Segoni et al. (2018a).

Factors other than hydrological variables were also tested in improving predictions of landslides. Abraham et al. (2020) proposed an algorithm by utilizing a field-measured tilting angles over the empirical rainfall threshold, and found that it outperforms the model only using rainfall information, with model efficiency of up to 92%. Marin et al. (2020) also employed rainfall intensity and duration-based threshold, but incorporated several morphometric parameters including slope, relief, area, and extents within basin having specific terrain characteristics (e.g., terrain greater than 30°). They found the significant role of slope and extents with particular topographical complexity in defining rainfall threshold, and that a specific threshold cannot be extended to other basins. Geological conditions including soil characteristics were also found to have non-negligible impact on predicting shallow landslide (Bartelletti et al. 2017; Cevasco et al. 2014; Ohlmacher 2000). In particular, Bartelletti et al. (2017) revealed that lithological characteristics of the bedrock is the most important factor in localizing shallow landslide; slope angle was not strongly associated with the initiation of landslide in their work. Kim (2007) and Lee et al. (2022) demonstrated necessity of including geological properties and soil characteristics (e.g., density, porosity) when assessing landslide conditions.

Rainfall-based studies to analyze landslides have previously been conducted in Korea (Kim et al. 2013; Lee et al. 2016). Hong et al. (2018) determined rainfall intensity-duration curves using antecedent rainfall. Kim et al. (2021) also examined the effects of antecedent rainfall conditions on landslide triggers using thresholds. The relative magnitudes of cumulative and antecedent rainfall were found to be the most important factors in predicting the occurrence of landslides. However, only a few studies in Korea focused on soil saturation levels (Kim et al. 2004; Kim and Lee 2013).

While the importance of considering geological and geographical characteristics when addressing landslides also has been demonstrated, it is recognized that there are practical challenges associated with obtaining up-to-date data on such information, since it is laborious and costly. To address the abovementioned issues, this study aims to specifically explore the role of hydrological properties in shallow landslide initiation by analyzing both the intensity of external forces (rainfall) and soil saturation levels (soil moisture). We utilized remotely sensed and modeled data to examine their potential for real-time monitoring and prediction. The objectives of this study are threefold: (a) to determine the distribution and variation of the hydrological variables before and during landslides; (b) to obtain specific knowledge through a combination of hydrological properties; and (c) to determine three-dimensional thresholds and assess their availability. First, the variable distributions were compared. The relationship between landslides and hydrological conditions was discovered by thoroughly observing the variations of the variables. A two-dimensional analysis was performed to determine what specific results could be derived using the additional variables. Then, conservative two-dimensional thresholds were obtained. Finally, a three-dimensional empirical threshold was acquired. Quantitative detection capability and false alarms (FA) were identified through the validation process.

2 Study area and dataset

The study has been performed at a national scale. South Korea is located in East Asia, at 32°N–37°N and 125°E–130°E, having area slightly over 100,000 km2 (Fig. 1). The country is under hot-summer humid continental climate zone (Jeong et al. 2021; Peel et al. 2007), and subjected to the East-Asian monsoon. The average annual rainfall ranges from 1000 to 1850 mm, with the monsoon accounting for roughly two-thirds of total rainfall in summer (Lee et al. 1998). Due to the humid and rainy summer, potential landslide normally occurs from June to September (Kim et al. 2021). High mountains are distributed along the northeastern coast and mid-south region, covering over 60% of the total area (Fig. 1a). Majority of the mountains (especially where landslide occurred) consists mainly of loamy textured soils (Fig. 1b).

Fig. 1
figure 1

a Geographical location of the study area, and maps showing spatial distributions of b elevation, c soil texture, and d slope. Locations of landslide events, validation sites, and paths of two typhoons in 2016 and 2019 are shown together in panel (b)

Korea Forest Service (KFS) gathers domestic landslide events and provides public access through a data portal (https://data.go.kr). The public landslide record contains dates a landslide was initiated and ended, and an administrative division a landslide took place. Due to the absence of information on landslide type, this study assumed that the majority of landslide events in South Korea belongs to shallow landslides, based on the previous studies; Pradhan and Kim (2020) and Lee et al. (2022) demonstrated that the rainfall-triggered shallow landslide is the dominant type of landslide across the country. Kim et al. (2021) revealed up to 87% of total landslide events from 1963 to 2018 was shallow landslide and about 13% was debris flow accompanied by shallow landslide, using local newspapers. From 2016 to 2019, 189 historical landslide events were identified in South Korea. After excluding points with inherent missing values in dates or locations, 170 events were considered for the analysis (Fig. 1). Among those, 160 landslides occurred over loam soil, and the rest took place over clay loam. Densely distributed landslides can be attributed to various factors, such as local heavy rainfall, earthquakes, and typhoon direction. Typhoon Chaba passed through Jeju Island in 2016 and impacted the peninsula’s south-eastern region. In 2017, relatively few landslides occurred in the inland areas, and the landslides were located outside the area of typhoon influence. Those landslides appear to have been caused by localized torrential rains. The landslides in 2018 (LS2018) were similar to those in 2017 (LS2017). They frequently occurred in high-altitude mountainous terrain. Typhoon Mitag caused landslides across a wide area in 2019. From September 28th to October 3rd, 2019, Mitag moved southwest from the East Sea. Most of landslides occurred near the East Sea and in the inland region (Fig. 1).

In this study, Global Precipitation Measurement (GPM) and Global Land Data Assimilation System (GLDAS) were used. The GPM mission was launched on February 27th, 2014, by the National Aeronautics and Space Administration (NASA) and Japan Aerospace Exploration Agency (JAXA). GPM integrates high-quality estimates based on passive microwave sensors with infrared estimates from geosynchronous weather satellites to observe precipitation and serve as a reference standard. Rain gauge data is used to make bias adjustments (Huffman et al. 2015). Meanwhile, GLDAS is a terrestrial modeling system that generates optimal fields of land surface states and fluxes in near-real-time (Rodell et al. 2004). To date, five land surface models (LSM), namely the Community Land Model, Catchment LSM, Mosaic, Variable Infiltration Capacity model, and Noah LSM have been used, with various forcing fields (e.g., precipitation, radiation, temperature, and humidity). The layer depth defined in each LSM is different for soil moisture data (Rui et al. 2018). The Noah-LSM-based GLDAS, for example, provides soil moisture at four depths (0–10, 10–40, 40–100, 100–200 cm) whereas Catchment-LSM (surface, root-zone, and profile) and VIC-LSM (surface, second, and bottom layer) datasets provide soil moisture at three depths. In this study, daily precipitation from the Integrated Multi-satellitE Retrievals for GPM (IMERG) and Noah 3.6-based 3-hourly GLDAS 2.1 are used; GLDAS was aggregated at a daily scale to be consistent in temporal resolution. Since the landslide record provides the smallest administrative division, precipitation and soil moisture values of the pixel containing the division were extracted and used for analysis at a grid scale.

The amount of precipitation on the day of the landslide (DLS) was calculated using daily accumulated precipitation (P). There is a risk that resolutions coarser than daily scale may result in a failure to capture the hydrological impacts because landslides were recorded on a daily basis. The antecedent precipitation index (API) is a metric that can be used to represent the wetness condition of a rainfall-affected basin (Choudhury and Golus 1988). Assuming that precipitation’s effect on the circumstances decreases exponentially over time, API represents the contribution of the accumulated precipitation as follows:

$${\mathrm{API}}_{t+1}={\mathrm{P}}_{1}{k}^{t-1}+{\mathrm{P}}_{2}{k}^{t-2}+\cdots +{\mathrm{P}}_{t-1}{k}^{1}+{\mathrm{P}}_{t}=\sum_{i=1}^{t}{\mathrm{P}}_{i}{k}^{(t-i)}$$
(1)

where t and k represent the acquisition time and decay parameter, respectively. A higher value of k implies a greater contribution and a longer influence of the preceding precipitation. In general, k can be derived empirically or chosen arbitrarily as values ranging between 0.80 and 0.98 (Koehler and Linsley 1951). In this study, a k value of 0.84 was used, which was tested to be reasonable in a local study (Chae et al. 2016).

Soil moisture is directly related to ground stability (Ching-Chuan et al. 2009). The daily soil moisture was calculated by averaging 3-hourly data and normalized as values ranging between 0 and 1 for outlier detection (Kandanaarachchi et al. 2020). The normalized antecedent soil moisture (ASMnorm) is the normalized soil moisture (SMnorm) value the day before the landslide. The variation width was identified using the soil moisture increment (∆SMnorm) as shown in Eq. (2), as follows:

$${\Delta \mathrm{SM}}_{\mathrm{norm}}={\mathrm{SM}}_{\mathrm{norm}}-{\mathrm{ASM}}_{\mathrm{norm}}$$
(2)

3 Methodology

3.1 Three-dimensional thresholds

Thresholds represent balanced conditions for the occurrence of a landslide. Although a two-dimensional threshold has been widely used in the past (Caine 1980; Saito et al. 2010), a three-dimensional threshold is also gaining popularity (Rosi et al. 2021; Salee et al. 2022). Since three variables were used in this study, they appear in the three-dimensional coordinate space, as in Eq. (3),

$${f}_{thr}\left(x,y,z\right)=ax+by+cz+d=0$$
(3)

where a, b, c are the normal vectors of a threshold plane, and d is the intercept term. The threshold plane is determined by first fitting the variables with least square method to determine the normal vectors (Pradhan et al. 2019; Salee et al. 2022) and deciding the intercept to have 5%, 20% probability level of triggering landslide (Pradhan et al. 2019). Based on the relative position of the hydrological variables on the three-dimensional space, it is decided whether a landslide might occur:

$$f\left({V}_{1},{V}_{2},{V}_{3}\right)=\left\{\begin{array}{l}Landslide \, if : a{V}_{1}+b{V}_{2}+c{V}_{3}+d>{f}_{thr}\\ not \, Landslide \, if : a{V}_{1}+b{V}_{2}+c{V}_{3}+d\le {f}_{thr}\end{array}\right.$$
(4)

where V1, V2, V3 are hydrological variables either directly or indirectly related to the initiation of a landslide. A landslide initiation is expected if the point is located above the threshold plane, and vice versa (Eq. (4)). For this purpose, a Python version 3.9.12-based thresholding-predicting code was developed.

3.2 Statistical assessments

Pearson’s correlation coefficient (R) and probability level (p-value) were used to compare and investigate the results. They are expressed as Eqs. (5) and (6):

$$\mathrm{R}=\frac{\sum (x-\overline{x })(y-\overline{y })}{\sqrt{\sum {\left(x-\overline{x }\right)}^{2}\sum {\left(y-\overline{y }\right)}^{2}}}$$
(5)
$$p\text{-}value=\mathrm{P}(\mathrm{R}=0|{\mathrm{H}}_{0})$$
(6)

where x and y are the variables of interest and \(\overline{x }\) and \(\overline{y }\) are their corresponding averages. R is a numerical value that expresses the linear correlation between variables, and ranges between − 1 and 1. A positive (negative) correlation appears to be near 1 (− 1). The p-value is a tool for testing the significance of the null hypothesis (H0), and it was used to calculate the significance of R by comparing the value to the significance level (α-level). A p-value less than the α-level means that the null hypothesis is almost certainly false (Helsel and Hirsch 1992).

4 Results

4.1 Characteristics of landslide initiation conditions

In order to analyze the landslide initiation conditions, it is necessary to understand the relationship between hydrological variables. This section describes the distributions of hydrological variables to determine the specific environment where landslides usually occur. A similar distribution can be seen throughout the year, as shown in Fig. 2. P appears to be larger than API, and SMnorm has risen sharply from ASMnorm. This distribution appears to be quite common where the rainfall drives the soil water content towards saturation.

Fig. 2
figure 2

Boxplots of all variables for the entire set of landslide events. Orange line represents the median, upper and lower boundaries of notched box indicate the upper and lower quantiles, and the black circles represent outliers. Two precipitation-related variables (API and P) use the left-hand side y-axis, while remaining soil moisture-related variables (ASMnorm, SMnorm, and ΔSMnorm) use the right-hand side y-axis

Figure 2 shows that the API2016 was relatively low, while the ASMnorm, 2016 was relatively high compared to ASMnorm, 2018 and ASMnorm, 2019. API2016 (34.40 mm) appears to be drier than API2018 (68.28 mm) and API2019 (53.71 mm). ASMnorm, 2016 (0.54) appears to be wetter than ASMnorm, 2018 (0.42) and similar to ASMnorm, 2019 (0.56). The distribution in the year 2017 appears to be quite different: P2017 is lower than API2017, while SMnorm, 2017 is higher than ASMnorm, 2017. Considering the P (around 30 mm), the magnitude of external forces appears to be small, whereas the increase in soil moisture is significant. 2018 and 2019 distributions have a similar pattern to 2016. Particularly in 2019, the amount of P2019 (127.36 mm) is relatively high with a wide range. As typhoons are accompanied by heavy rainfall, there may be a difference in the amount of precipitation depending on the distance from the typhoon (Fig. 1).

The overall hydrological conditions are depicted as the ‘Average’ values in Fig. 2 and Table 1. APIavg and ASMnorm, avg are 53 mm and 0.5, respectively. The Pavg (104.15 mm) is higher than the APIavg (53.46 mm), and SMnorm, avg (0.73) is also higher than the ASMnorm, avg (0.5, Table 1). What seems evident is that continued rainfall until DLS increased soil moisture. Given that the first quantile (25th percentile) and median of SMnorm are 0.69 and 0.76, respectively, the soils in most landslide-affected areas were quite saturated. As a result, the sudden increase in soil moisture caused by external forces significantly influenced the occurrence of landslides.

Table 1 Mean and standard deviation of each variable

∆SMnorm is a good predictor of the antecedent condition change. Positive ∆SMnorm, avg (0.22) and low quantile (25th percentile: 0.17) mean that the soil moisture has increased in most cases. However, a negative ∆SMnorm does not imply that the SMnorm is low. ∆SMnorm only shows the magnitude of the slope and contains no information about the starting position. ∆SMnorm is expressed as negative if ASMnorm is larger than SMnorm. Still, pore water pressure can remain high if the soil is still near saturation. In other words, landslides can occur even when soil moisture shows a decreasing trend.

Figure 3 shows the relationships among factors with the landslide initiation condition. Please note here that the number of scatters in each panel appears different because events that took place in the same grid cell at the same day could be overlapped. Different correlations were observed between P and API in different years (Table 2). Strong positive correlations were found in 2016 (0.70) and 2017 (0.85), while weak negative correlations were found in 2018 (− 0.19) and 2019 (− 0.59). Also, the standard deviations (std) of P and API are different (Table 1). Compared to std2018 (API for 18.73 and P for 46.80) and std2019 (API for 18.28 and P for 25.46), std2016 (API for 2.36 and P for 16.53) and std2017 (API for 8.43 and P for 11.11) are relatively small. This result might be related to the landslide location distribution (Fig. 1). Landslides in 2016 and 2017 were more densely distributed than in 2018 and 2019. Based on the close proximity between the points, it is likely that they were affected by related rainfall events. Considering the dramatic difference in correlations over the years, results in Fig. 3c–e were strongly affected by the characteristics of the landslide occurrence location.

Fig. 3
figure 3

Scatterplot matrices pairing the variables one by one. Landslide events by year are classified into colors. Statistical performance of each figure is shown in Table 2

Table 2 Pearson’s correlation coefficient with p-values of coupled variables in Fig. 3

The relationship between P and SMnorm depends on the influence of other factors (Fig. 3b). Increased soil saturation will result from high-intensity rainfall in shallow landslide situations, unless other factors are involved. However, soil moisture can vary independently from rainfall because factors other than rainfall are involved in causing soil moisture changes. This is apparent in 2017 (Table 2). The response to rainfall is deemed to have reached its peak for soil saturation of 0.7 or more, because the amount of water that can be held varies based on various soil properties.

SMnorm and ASMnorm have the highest positive correlation among variables (Fig. 3f), depending on the influence of other factors. SMnorm has increased compared to ASMnorm since most points are under the cross line. Due to the irregular rainfall contribution to soil saturation, the change pattern in soil moisture can vary by region with the same amount of rainfall. In other words, the more diverse the rainfall and pattern are, the lower the correlation coefficient will be. Because LS2017 was accompanied by little rainfall, the rise in soil moisture was minimal, suggesting that the correlation between SMnorm, 2017 and ASMnorm, 2017 was higher than in other years.

Depending on the variables depicted, each figure illustrates a different relationship. As shown in ‘Average’ (Table 2), there appears to be a weak correlation in Fig. 3a, c and d because the R is close to 0 and the p-value is greater than 0.05 (α-level). The correlation between antecedent condition factors (Fig. 3e) is weak (R = 0.19). This means that the time lag (the time taken for the effect of rainfall to appear as an increase in soil moisture) varies depending on a mediator (e.g., vegetation) or man-made effects. On the other hand, variables in Fig. 3b and f show a relatively high correlation (R of 0.56 and 0.63). In the case of Fig. 3b, greater P is associated with greater SMnorm. In the case of Fig. 3f, greater ASMnorm values are associated with greater SMnorm. Although the precedent conditions for each landslide area may differ, an increase in soil moisture has a significant correlation with rainfall.

4.2 Two-dimensional correspondence of variables

Before determining actual thresholds, an in-depth analysis was performed to thoroughly understand the initiation conditions. Figure 4 represents the relationships among hydrologic variables and landslide events from 2016 to 2019. The factors in Fig. 4a were combined to show the overall initial condition (API on x- and ASMnorm on y-axis), which represents the ordinary state at the given time period. There appears to be an overall positive relationship (Table 2). Because the external forces required for landslide initiation vary depending on the initial conditions (Zuoan et al. 2006), identifying these forces is critical for improving prediction accuracy. In general, API can be accepted as the magnitude of the past rainfall’s influence on the present. As a result, it can provide indirect information about soil moisture. On the other hand, ASMnorm includes the real state of the soil water changes on the day before the landslide. It is acceptable to see those factors as having a deep connection in that many studies used API to quantify soil moisture (Liao et al. 2021; Pan 2012; Pan et al. 2003). However, claiming that the relationship is linear is illogical. API includes only the effects of rainfall, whereas ASMnorm indirectly accounts for a wider range of predisposing factors, and not only rainfall.

Fig. 4
figure 4

a Scatterplots of API with ASMnorm supplemented by P b P with SMnorm c ∆SMnorm with P supplemented by ASMnorm

It should be noted that even at a low level of ASMnorm (near 0.2), ten landslides occurred. Variable P was used to identify those conditions. It is noteworthy that 12 landslides occurred with P of less than 50 mm. Although high-intensity precipitation can cause shallow landslides even in relatively low SMnorm, it is difficult to see that P less than 40 mm may act as an external force powerful enough to destroy the surface stability. Due to these uncertainties, Abraham et al. (2021) emphasized the necessity of specific information about landslides. There are various landslide types in which pore water pressure does not act as a dominant factor (Evans et al. 2001), or even precipitation (Abraham et al. 2021). Here, we assumed that the type of 8 landslide cases occurred under P less than 40 mm differs from high intensity P-triggered shallow landslide. Since lowering the threshold to ~ 15 mm of P could result in huge number of FAs, 40 mm of P is firstly considered to be the threshold for an acceptable trade-off to prevent total loss of landslide detection capability (Segoni et al. 2018a).

Increasing soil moisture by precipitation appears as logistic, with positive correlations in Fig. 4b. Rainfall causes significant increases in soil moisture. The SMnorm distribution varies from 0.3 to 1.0 (saturated), with a minimum value of 0.32. However, based on the value of P, it is clear that points between 0.3 and 0.4 fall outside the range of our analysis in Fig. 4b. Therefore, the minimum level for the initiation of shallow landslides is around 0.4. In the figure, the distribution of P is noteworthy. It ranges from 18 mm (minimum) to 250 mm (maximum). Even in an extreme case, the P of 18 mm appears insufficient to generate significant external forces. In Fig. 4a, seven out of eight red triangles occurred near an ASMnorm of 0.2. Insufficient rainfall (less than 40 mm) over these dry subsurface is unlikely to cause shallow landslides. To account for such cases, the minimum level of P for landslides was set at 40 mm.

Overall positive relationship between P and ∆SMnorm is observed in Fig. 4c. The center red line bisecting the plot indicates ∆SMnorm = 0. Points on the right (left) side represent increased (decreased) SM on DLS. Since the decreasing trend in soil moisture is not common in rainfall-induced landslides, the ASMnorm variable was added for a more detailed analysis. Most points (155 out of 170) are located on the right side, and most ASMnorm values at those points are less than 0.7. In addition to the absolute SMnorm (as shown in Fig. 4b), the increase in soil moisture for the preceding conditions is strongly linked to the occurrence of landslides. There are fifteen points on the left-hand side of the center line. Among those, seven cases occurred when ASMnorm was higher than 0.85 (i.e., the soil was already saturated). The influence of stronger external force had already been reflected in the soil and resulted in a negative ∆SMnorm despite a considerable amount of P on DLS.

Figure 4 have one thing in common: they depict the characteristics of the initiation conditions, which can be understood and identified more precisely by using additional variables. Only two variables were used in Fig. 4b. The relationship between factors can be identified, but the analysis of low-P cases was limited. Using P, however, number of outliers were identified in Fig. 4a. A detailed analysis was performed in Fig. 4c, adding ASMnorm. Specifically, appropriate factors can improve the understanding of the landslide-prone environment.

4.3 Three-dimensional analysis

4.3.1 Determining three-dimensional thresholds

Another condition for landslide initiation is expressed by a three-dimensional threshold (Fig. 5). Since landslides can occur under different hydrological conditions, this study tried to consider such heterogeneity by expanding the number of dimensions used in the threshold. Two threshold planes having 5% and 20% probability level were proposed. The reasons for using two conservative standards are twofold: (1) to avoid excessive FA, and (2) to eliminate outlier effects. Using P, API, ∆SMnorm as x, y, z in Eq. 3, a, b, c in Eq. 3 were − 0.002, 0.001, − 1, respectively. Since 170 landslide events are analyzed, the two planes, below which 8 landslide events (5% probability level) and 34 events (20% probability level) are located, were decided with the intercept d value of − 0.038 and 0.159, respectively.

Fig. 5
figure 5

Three-dimensional threshold. API on x-, P on y-, and ∆SMnorm on z-axis. Upper (red plane) and lower limit (blue plane) show the 20% and 5% percentile, respectively

4.3.2 Validation

Five sites (identified by red pentagram marks in Fig. 1) were chosen from areas with more than two landslides. Hydrological variables were obtained for each grid cell containing landslide event from September 2016 to December 2019. The landslide signal was discovered using time-series analysis. Since landslide events at sites 1 and 2 (3 and 4) took place over the same GLDAS grid cell and neighboring GPM grid cell, the temporal patterns of time series resemble each other and therefore plots of site 1, 3, 5 are shown in Fig. 6. Red squares indicate landslides expected from higher limit three-dimensional threshold (20% probability level), while blue dots indicate landslides predicted using lower limit threshold (5% probability level). Table 3 displays detection results of all five sites. To assess detection capability, two criteria were used: (1) how many FA are detected and (2) whether or not landslides are detected. The number of FA and results of landslide detection (indicated as O or X) were assessed. O means that landslide signals are detected on the occurrence day. X indicates that the thresholds failed to detect the landslide signals.

Fig. 6
figure 6

Time series analysis for detecting the points of false alarms. a Site 1, b Site 3, c Site 5

Table 3 False alarms and detection results of each threshold from September 2016 to December 2019

Fewer than 30 FA were detected by Threshold 1 (THR1) (Table 3). This is quite low compared to other cases (more than 300 for Threshold 2 (THR2), 1000 for Threshold 3 (THR3), and 100 for Threshold 4 (THR4)). Compared to THR1, an overall reduction of FA was observed in Threshold 5 (THR5). The reduction rate in Threshold 6 (THR6) was similar, but slightly higher than that of THR5. In the case of Threshold 7 (THR7), it demonstrated a significant reduction compared to the others. THR1, THR2, and THR3 failed to detect two out of ten cases at Sites 1 and 2 (Fig. 6a). Because these two incidents occurred when P < 40 and ∆SM < 0, it is difficult to classify them as a general shallow landslide. Due to the massive amount of FA, a single use of THR3 appears to be insignificant. Although the THR4 significantly reduced the amount of FA, the rate of landslide detection also decreased significantly. It failed to detect the landslides at Sites 3 (Fig. 6b) and 4. Since they satisfied THR1 and THR2, they appeared to be a general type of shallow landslide. It is worth noting that other landslides were successfully detected at Sites 1, 2, and 5 (Fig. 6c). These findings agree with the results of Zhao et al. (2019) and Rosi et al. (2021) in that the FA reduced at the expense of decreasing true positive rates when hydrological variables (antecedent soil moisture and mean areal rainfall) are added. These findings may also demonstrate interregional heterogeneity. Though soil saturation significantly impacts unsaturated shear strength (Tsai and Chen 2010), spatial variation in soil moisture varies depending on regional climate, vegetation distribution, and other dominant soil properties (Ivanov et al. 2010; Lawrence and Hornberger 2007). Also, prevailing wind related to several topographical predictors (e.g., exposure, slope, elevation) can contribute to precipitation variability (Basist et al. 1994). Due to the complex structure of soil (Bogaard and Greco 2016), the degree and velocity of slope stability change appear differently for a given hydrological condition. In further research, calculating the warning level divided by the thresholds may be useful in evaluating the risk of landslides or making decisions, rather than just predicting their occurrence (Martelloni et al. 2012).

Macroscopically, THR1, THR2, THR5, and THR6 have the same detection results, whereas FA differs from those. When two or three thresholds were used simultaneously, the reduction rate was higher than when a single threshold was used. Even though variables contribute with varying degrees (Ziadat and Taimeh 2013), landslides are caused and triggered by complex interactions of various factors. When well-founded factors are utilized, they can be effectively lower the FA while still maintaining detection accuracy.

5 Discussion

The results shown in this study is in accordance with the previous threshold-based studies in that shallow landslides are triggered with high amount of precipitation and soil moisture (Guzzetti et al. 2007; Segoni et al. 2018b). By looking deeper into the distributions and relationships between the direct and indirect factors, we could obtain a more comprehensive understanding of the hydrological conditions leading to shallow landslides in the Korean landscape. The generally higher distribution of P compared to API suggests that the immediate role of P at DLS is dominant over the antecedent condition (Brunetti et al. 2010). Especially, cases in 2016, accompanied by the lowest API, indicate that precipitation with high intensity (presumable from the typhoon path) leads to slope failure. However, cases in 2017 showed that a smaller amount of P could also drive landslide events over specific regions and under conditions where soil moisture is assumed to be saturated enough (Baum and Godt 2010). The role of P could possibly be more deeply analyzed with information on intensity and duration. Since soil moisture is distributed heterogeneously with depth (Choi and Jacobs 2007) and responses variously (Tsai and Chen 2010), the characteristics of landslides may differ depending on the characteristics of the rainfall (Martelloni et al. 2012). As shown in Marino et al. (2020), antecedent root zone soil moisture improves predictive ability, which has provided a new insight for shallow landslide analysis. Short-duration, high-intensity rainfall is known to cause shallow landslides and debris flow, whereas long-duration, low-intensity rain is known to cause deep-seated landslides due to groundwater level rise (Martelloni et al. 2012; Prokešová et al. 2013). That is, different rainfall patterns influence soil water distribution, which in turn influences the landslide mechanism. It is difficult to say that the points that were excluded in determining the threshold (Fig. 4) meet the conditions for a shallow landslide, but they may be related to the other types of landslides. Therefore, utilizing rainfall intensity and duration via hourly-scale rainfall data appears to improve not only the relationship between landslide occurrence and rainfall pattern, but also the understanding of soil moisture dynamics and distributions.

Shallow landslides are known to be caused by increase in soil moisture, rather than by P itself (Rosi et al. 2021). The results showed that the soil moisture trend reflects the environmental characteristics of the region until the precipitation causes a rise in pore water pressure, and it is sufficiently understood even with daily-scale data. In particular, ∆SMnorm is found to be a powerful variable representing the contribution of external force generated on DLS. The positive ∆SMnorm value indicates that in most cases, soil moisture has increased to initiate landslide. However, more than 10 cases with negative ∆SMnorm highlight the importance of knowing the starting position of soil moisture (ASMnorm), and the potential for landslides even when there’s a decreasing trend in soil moisture (Baum et al. 2010). The nonlinear but interdependent features of hydrological conditions necessitate a multidimensional approach. Relying solely on one-dimensional threshold can lead to oversimplifications, potentially missing out on the intricate interaction of variables. It's evident from our findings that expanding the dimension, rather than relying on separate one-dimensional thresholds, provides a more comprehensive understanding of the conditions, as shown in Fig. 4. This is because the effects of hydrological conditions on landslides are multifaceted and cannot be captured by a singular metric (Guzzetti et al. 2008; Fan et al. 2016). Introducing the experimental three-dimensional threshold, however, did not directly improve FAs or detection capabilities (Table 3). The enhancement in both the reduction of FAs and accuracy of detection was found when jointly used with one-dimensional thresholds (FAs were significantly reduced when thresholds regarding P, ∆SMnorm and three-dimensional threshold are applied together). This synergy can be attributed to the fact that while the multi-dimensional threshold captures the complex relationship of variables, the one-dimensional thresholds provide focused insights into specific conditions (Segoni et al. 2018b; Brunetti et al. 2010).

Despite ongoing efforts to understand landslides from a hydrological standpoint, some uncertainties and ambiguities remain. The first is that historical landslide data is difficult to verify. Data accuracy cannot be assured because the infrastructure for observing landslides in Korea is limited. Even if a landslide occurs, it may be recorded as ‘did not occur’ if it is not observed. The term ‘did not occur’ encompasses both cases: (1) no landslide actually occurred (2) no landslide was recorded due to a lack of observation. These examples can be used to demonstrate a significant error in determining the threshold. Many studies have attempted to determine the occurrence of landslides by referring to various sources (literature surveys, newspapers, etc.). Still, this approach appears to have limitations because of the incomplete records. As a result, validated data is required for more precise analysis.

Another factor is uncertainties originating from spatiotemporal characteristics of the data used. Errors may be introduced due to mis-recording; if a landslide is recorded a day after actual occurrence, the hydrological condition the day after the occurrence would be deemed to be a landslide-prone condition. This could lead to the result in which landslides appear to have initiated under conditions less than the ideal threshold. Uncertainty still exists even when the occurrence and recorded dates are the same. The uncertainty stems from not knowing the exact timing of the occurrence. Since all data used in this study (Landslide record, satellite-based precipitation, and reanalysis data) are at daily scale, the exact time of landslide occurrence and sub daily variability of hydrological factors cannot be considered. The hydrological information that was used to determine the initial condition varies depending on the actual time of occurrence. A more detailed dataset in time may give more detailed information for landslide prediction, as Naidu et al. (2018) emphasized the necessity of hourly monitoring rainfall intensity to improve prediction capabilities. Time lag in precipitation and landslide initiation can also derive an inconsistency in estimating landslide triggering condition. The delayed effects of the preceding rainfall on soil moisture can lead to a landslide. In the case of the Zhonghai landslide (Hou et al. 2022), 42 h had passed after the preceding rain. A slow accumulation of groundwater due to an antecedent excessive precipitation event may cause a lag response, increasing groundwater levels steadily even after the cessation of precipitation.

Despite these uncertainties and ambiguities, this study provides a novel contribution in using remotely sensed and modeled data. Currently, national landslide information system (http://sansatai.forest.go.kr) established by Korea Forest Service (KFS) is providing landslide hazard map generated from logistic regression using variables related to topography, forest type, soil depth, and soil texture. However, updates on this information require extensive survey. Direct slope stability calculation, a theoretical approach, also necessitates multiple data sources, which are difficult to obtain or may not exist. Hydrological factors, on the other hand, which are closely related to geology, are available from various sources (e.g., in-situ, satellite, reanalysis). Especially, spatial data from satellite and models enables national and global scale analyses (Ray and Jacobs 2007). Still, quantitative analyses of the performance compared to in-situ measurements-based predictions are required for more sophisticated local landslide detection.

6 Conclusion

This study identified the relationships between variables under landslide initiation conditions and determined the three-dimensional hydrological thresholds for landslide prediction. Two rainfall-related variables (antecedent precipitation index and daily accumulated precipitation) and three soil saturation-related variables (antecedent soil moisture, daily averaged soil moisture, and soil moisture increment) were used. The correlation between variables differed significantly over the years. This appeared to be related to the distributions of landslide location. Nearby regions had similar hydrological characteristics, which had a profound impact on correlation. We regarded two findings as thresholds based on the shallow landslide mechanism. First, soil moisture had increased before the landslide in 155 out of 170 cases (91%). This demonstrates that sudden increases in soil moisture can trigger landslides. Second, rainfall leads to a sudden increase in soil moisture. Our results indicate that more than 40 mm of rainfall is required for landslide initiation. To reflect the dynamic relationships between variables, a three-dimensional threshold was calculated. Two threshold planes based on the probability level (5% and 20%) were suggested to avoid FA and eliminate outlier effects. A rainfall-based threshold was most effective among the single thresholds. With the combination of a soil moisture-based threshold and a three-dimensional threshold, FA was significantly reduced. In some cases, the detection accuracy decreased along with the decrease in FA rates. The site-specificity of the landslide zone needs to be considered to balance the trade-off between the accuracy rate and the false alarm rate.

The results show the combined effect of soil moisture and rainfall in thresholding shallow landslides. Despite practical limitations (lack of in-situ data and landslide information), the additional use of hydrological factors can help elucidate the mechanism of landslides. More accurate landslide analysis will be possible using validated landslide data or data with spatiotemporally high resolution.