Introduction

Landslides are mass movements of rock and soil down a slope, which can be destructive to people and infrastructure causing direct and indirect damage to natural ecosystems (Cruden and Varnes 1996; Geertsema et al. 2006; Fell et al. 2008a; Geertsema and Foord 2014; Chen et al. 2020b; Liang et al. 2022). An investigation from the past two decades (1998–2017) has shown that up to 4.8 million people have been affected by landslides, and over 18 thousand people have lost their lives (CRED 2018). Constructing early warning systems (Wadhawan 2019; Yang et al. 2019), landslide susceptibility (Catani et al. 2013; Di Traglia et al. 2018; Fang et al. 2021), landslide hazard (Guzzetti et al. 2005) and landslide risk zonation (Guo et al. 2020; Xiao et al. 2020a; Li et al. 2021) are effective approaches to managing landslide-prone areas. Important aspects of such landslide hazard assessments are landslide magnitude-frequency distributions (Catani et al. 2005; Fell et al. 2008a; Guzzetti et al. 2012; Wang et al. 2017; Zhou et al. 2020; Jia et al. 2021). Through remote sensing or post processing methods, landslide geometrical characteristics can often be addressed, such as landslide area (Larsen et al. 2010), landslide velocity (Horton et al. 2013) and landslide displacement (Chen et al. 2020a). However, determining the date of landslide occurrence is often challenging in remote areas.

Remote sensing techniques are available to observe land surface deformations, and are applied to monitor landslides widely (Henderson and Lewis 1998; Nichol and Wong 2005; Casagli et al. 2016; Ghorbanzadeh et al. 2019, 2022; Ye et al. 2019; Zhong et al. 2020). To date a landslide in the past few decades (e.g. from 1980s onwards), it is crucial to study dense and long historical remote sensing time series. Therefore, the Landsat satellite image archive (NASA, https://landsat.gsfc.nasa.gov/) is preferred because of the long period of operation from 1972 to the present day, the moderate spatial resolution of 30 m in the visible light and infrared bands, and the satellite revisit time of 16 days per satellite (Chander et al. 2009; Gorelick et al. 2017). Yet, there are limited studies exploring the suitability of the optical remote sensing satellite image archive for landslide dating. The methods for landslide dating using Landsat satellite imagery can be roughly divided in two main categories: visual image interpretation (e.g. Geertsema and Foord 2014; Coe et al. 2018) and (semi-)automatic methods (e.g. Deijns et al. 2020). A downside of visual image interpretations is that it is time- and labour-consuming (Geertsema et al. 2006; Geertsema and Foord 2014). Therefore, where data resources and monitoring equipment are limited, (semi-)automatic methods are preferred. Several semi-automatic landslide dating methods based on remote sensing have been developed (Kennedy et al. 2007; Deijns et al. 2020). These methods are mostly applied to vegetated areas, aiming at identifying the partial or complete removal of vegetation by landslide activity, and following the subsequent regrowth. For example, Deijns et al. (2020) suggested a semi-automatic landslide dating approach using harmonic modelling in Buckinghorse River area, Canada, based on the vegetation phenology characteristics as well as the relationship between landslide changes and the normalized difference vegetation index (NDVI). Unfortunately, their method requires expert judgement and strongly depends on a well-developed seasonal vegetation cycle, to obtain reliable results. Another semi-automatic method that detects temporal changes in vegetation is LandTrendr (Landsat-based detection of Trends in Disturbance and Recovery) (Kennedy et al. 2007, 2018). LandTrendr was originally developed for monitoring terrestrial forest disturbance and is suitable for detecting annual changes in vegetation cover induced by sudden events such as logging and wildfires (Kennedy et al. 2018; Hislop et al. 2020; Senf and Seidl 2021). LandTrendr provides a spectral-temporal segmentation algorithm, useful for annual NDVI change detection in a time series of moderate resolution satellite imagery on a pixel-by-pixel basis (Kennedy et al. 2007, 2018). While the potential for monitoring temporal changes in vegetation cover has been shown by various studies (Kennedy et al. 2018; Chen et al. 2019; DeVries et al. 2020; Tian et al. 2020; Jia et al. 2021), these still have to be evaluated further for landslide dating applications.

Here, we present a novel algorithm developed in Google Earth Engine (GEE; Gorelick et al. 2017; Wu 2020) for automatic landslide dating based on time series of remote sensing images referred to as SWADE (Segmented WAvelet-DEnoising and stepwise linear fitting). GEE provides a cloud-based computation platform that enables time-effective exploration and analyses of a large archive of Landsat remote sensing images since 1985. We evaluate and quantify the accuracy of the novel SWADE model against 66 landslides that occurred over the past decades (1985–2017) in the Buckinghorse River area, northeastern British Columbia, Canada. The selected Buckinghorse River area is a densely covered pine forest with clear seasonal phenological changes and with frequently occurring landslides. These site characteristics make it suitable to evaluate and compare the accuracy of SWADE to two previously published semi-automatic methods, harmonic modelling (Deijns et al. 2020) and LandTrendr (Kennedy et al. 2018).

Buckinghorse River landslide database

Figure 1 shows our study site, the Buckinghorse River area, northeastern British Columbia, Canada (57°33′–57°22′N, 122°19′–122°45′W). This area has an elevation range from 682 to 1100 m a.s.l, as well as a high landslide activity over the past decades. In the study area, boreal forests are economically important and are dominated by black and white spruce, and lodgepole pine (Deijns et al. 2020). The study area is in the “isolated patches” permafrost zone; and permafrost degradation stimulates landslide activity (Heginbottom et al. 1995; Geertsema and Foord 2014). There are no direct meteorological data available in the area but the mean annual precipitation in the Peace District (~ 200 km south of the case study with a time series available from 1910 to 2008) and Fort Nelson District (~ 200 km north of the case study with a time series available from 1937 to 2008) are 497 mm and 486 mm, respectively (Foord 2016). From 1910 to 2008, an increase in mean annual temperature of + 0.5 °C has been reported in the Peace District and an increase of + 1.1 °C has been reported in the Fort Nelson District. Mean winter temperatures have increased by approximately 2.0 °C and 2.5 °C in these areas, respectively (Foord 2016). These rising temperatures cause permafrost melting, loss of soil stability, increase loose material availability and trigger landslide activity (Geertsema et al. 2006).

The main lithological units are sequences of shale and sandstone of lower Cretaceous and upper Cretaceous age, as well as exposed units of rocks of the Carboniferous Prophet Formation to Cretaceous Buckinghorse Formation (Lane et al. 1999; Geertsema et al. 2006). The shale sequences are especially prone to landslide activity. The major folds have kilometre wavelengths and a predominant north-northwest trend (Lane et al. 1999).

A total of 66 landslides have occurred between 1985 and 2017, as identified from Landsat (5, 7 and 8), and visually confirmed on Sentinel-2 and lidar imagery in the Buckinghorse River area by Deijns et al. (2020). A total of 21 landslides occurred in the period 1985–1995; 19 landslides occurred in the period 1996–2005; and 26 landslides occurred in the period 2006–2017. The mean landslide area is 6.95 × 104 m2; minimum landslide area is 5.1 × 103 m2 and maximum landslide area is 2.54 × 105 m2. The run out distances of the landslides extended up to 1765 m.

Fig. 1
figure 1

a Location of the Buckinghorse River area in British Columbia, Canada, and the identified landslides with a rough indication of their date of occurrence. bd Photo of landslide LS60 and Landsat images before and after landsliding. eg Photo of landslide LS53 and Landsat images before and after landsliding. The yellow arrow indicates the movement direction of the landslides. Forest area is chosen as undisturbed forest locations close but outside the landslide area

Methodology

Landslide dating is defined as the most probable landslide occurrence date range based on a time series of satellite image observations (Fell et al. 2008b; Crozier 2010; Korup et al. 2019; Deijns et al. 2020; Dewitte et al. 2021). The key concept of landslide temporal detection is that, when a landslide occurs, the vegetation cover, most often trees, is removed, damaged or tumbled over (Geertsema and Pojar 2007; Geertsema et al. 2009), causing a sudden drop in the values of NDVI (Dou et al. 2015; Dewitte et al. 2021). At the same time, this also implies that other factors that cause a decrease in forest cover, and thus NDVI, introduce errors and uncertainties, for example wildfires, accelerated erosion, constructions and logging. The NDVI is utilized to monitor the vegetation change, which is the normalized difference between near-infrared (NIR) and red (RED) reflectance. NDVI is widely used in remote sensing studies of vegetation to study and evaluate temporal developments of vegetation and crops (Tucker 1979). In temperate and boreal climates, NDVI increases in the harmonic, sinusoidal cycle of the forest from low NDVI (winter) to high NDVI (summer). To detect the landslide on the basis of vegetation damage or removal, two periods in the NDVI time series of satellite images are important to be separated and identified: (1) the pre-landslide period when there is full vegetation cover, and (2) the post-landslide period when the vegetation is removed or damaged and there is bare land or dry-brown vegetation (Meroni et al. 2019). After the landslide activity, the vegetation will regenerate at recovery rates depending on the degree of damage done by the landslide and by vegetation growing conditions (Bartels et al. 2016; Hermosilla et al. 2016). The availability of a long time series of satellite observations, typically more than 30 years like the Landsat archive, and preferably dense observations in the time series is a prerequisite for the reliable recording the entire landslide evolution.

We evaluate the performance of the new SWADE, harmonic modelling (Deijns et al. 2020) and LandTrendr methods (Kennedy et al. 2018) (Table 1). Due to phenological changes or other noises (e.g. wildfire, cloud coverage and haze), it is often inevitable to detect multiple change points while landslide dating, after which the most likely date is manually determined in the harmonic modelling approach of Deijns et al. (2020). The harmonic modelling approach presents two results to differentiate the influence from locally phenological changes, taking the undisturbed forest growth situation as a reference: non-detrended harmonic and detrended harmonic.

Table 1 Overview of three landslide dating approaches, resulting in five results

In contrast, we avoid manual interpretation in the SWADE method. Here, we present the success rate of the most likely (SWADE-a) or two most likely change points (SWADE-a and SWADE-b). We refer to those two approaches as SWADE1 (the most probable date range only) and SWADE2 (the two most probable date ranges).

We present the results from the five landslide dating methods, presented in Table 1, and we describe the methodology of SWADE, harmonic modelling and LandTrendr shown in Fig. 2.

Fig. 2
figure 2

Flowchart of the methodology of the three methods employed in this work. a SWADE (this work). SWADE1 obtains the most probable date range of landslide occurrence, while SWADE2 obtains the two most probable date ranges of landslide occurrence. b Harmonic modelling (Deijns et al. 2020). The detrended harmonic modelling calculates the forest NDVI in the dashed line box, while the non-detrended harmonic modelling does not consider the forest NDVI. c LandTrendr (Kennedy et al. 2018)

SWADE: Segmented WAvelet-DEnoising and stepwise linear fitting

Our SWADE landslide dating approach is based on wavelet denoising of the NDVI time series and stepwise linear fitting of this signal. SWADE comprises three main steps, preprocessing, NDVI denoising, and segmentation of the NDVI time series.

Preprocessing

We used GEE to obtain the Landsat derived NDVI time series. The NDVI time series should meet four criteria (Deijns et al. 2020). (1) Covering an extensive timeframe (i.e. in our case images are available from 1985 onwards). (2) Cloud masking. We applied the “modified normalized difference cloud index” (Marshak et al. 2000) to mask out the effects of cloud, shadows and haze and applied a buffer of 200 m around the clouds (Deijns et al. 2020). (3) Minimization of seasonal changes. We subtracted the NDVI time series at an undisturbed forest location close but outside the landslide area (control NDVI) from the NDVI time series at the landslide location (experimental NDVI) to remove the seasonal cycle on the landslide NDVI time series. The experimental NDVI time series (landslide NDVI) is next calculated over one single landslide. The control NDVI time series (forest NDVI) is calculated over undisturbed forested area, with the same procedures of experimental NDVI. In order to minimize and exclude minor differences from a single forested area, we attain the average NDVI time series of twelve forest areas (Fig. 1a). Then, the cumulative difference (CDNDVI) between forest NDVI and landslide NDVI is calculated by Eq. (1).

$$CDNDVI = {\sum }_{t=1}^{t=T}{\left(\Delta NDVI\right)}_{t}={\sum }_{t=1}^{t=T}{\left(NDV{I}_{F}- NDV{I}_{L}\right)}_{t}$$
(1)

ΔNDVI is the difference between NDVIF and NDVIL, t is the timestep and the T is the time range. (4) Filling data gaps. Data gaps (Hermosilla et al. 2016) usually occur in the NDVI time series due to cloud masking or scarce images. We utilized a “moving mean” algorithm (Deijns et al. 2020) to fill in the data gaps with a sliding window of seven images.

Denoising method

The wavelet transformation method is a common denoising method which can separate different frequencies in a signal, with a result of noise or detail (high frequency) and approximation (low frequency) (Lee et al. 2019). The wavelet transformation process includes two steps: wavelet decomposition and wavelet reconstruction (Lee et al. 2019). The wavelet transformation is implemented through Python (3.8 version, https://www.anaconda.com/products/individual), and the denoised NDVI information CDNDVI_w is calculated from the original CDNDVI.

Segmentation of the time series

The segmenting principle is fundamental to separate the whole NDVI time series into different vegetation growth stages. SWADE assumes that the CDNDVI_w obeys linear change through the time series in both the pre-landslide period and the post-landslide period. (1) Before the time of landslide occurrence, the vegetation growth in the landslide zone and in the forest zone has limited difference through the whole year, because of their similar geological background and climate conditions. The ΔNDVI (in Eq. (1)) can be regarded as a constant value. (2) After landslide occurrence, the landslide vegetation is scarce; at the same time, the forest vegetation is comparatively abundant. Hence, the ΔNDVI (in Eq. (1)) will increase, especially during the landslide recovery stage. In short, the CDNDVI_w obeys linear change prior to the landslide, followed by a comparative increase in the post-landslide period, especially in the vegetation regeneration period.

Based on this segmenting assumption, SWADE is used to separate the “CDNDVI_w” curve into several short linear pieces to obtain break points, which indicate the probable date range(s) of landslide occurrence. The slope value of CDNDVI_w is an important indicator for the pre-landslide period and post-landslide period, and thereby the date of landslide occurrence. Figure 3 shows the three main development types of CDNDVI_w curves having distinct change patterns.

Fig. 3
figure 3

The three main types of CDNDVI_w evolution indicating three different change patterns. Situation I: the vegetation growth in the landslide area is similar to that in the forest locations before the landslide. Situation II: the vegetation growth in landslide area is faster compared to that in the forest locations before the landslide. Situation III: the vegetation growth in landslide area is slower than the growth in the forest location before the landslide

  1. 1.

    Situation I. Prior to the landslide, the vegetation in the forest zone and the vegetation in the landslide zone have an equal growth rate, so the CDNDVI_w is a constant value. After the landslide, the vegetation cover and growth rate in the landslide area sharply decrease, leading to a sharp increase in CDNDVI_w.

  2. 2.

    Situation II. Before the landslide, the growth rate of the vegetation in the forest exceeds the growth rate in the pre-landslide area, so the CDNDVI_w increases linearly. Once the landslide happens, the vegetation cover and growth rate in the landslide area sharply decrease, and the slope of the CDNDVI_w increases. The slope of the CDNDVI_w is larger in the post-landslide section of the curve than in the pre-landslide section of the curve.

  3. 3.

    Situation III. Before the landslide, the vegetation in the forest grows slower than the vegetation in the pre-landslide area, so the CDNDVI_w decreases linearly. However, after the landslide the vegetation cover and growth rate in the landslide area sharply decrease, and CDNDVI_w sharply increases. In this situation the slope value of CDNDVI_w changes from negative to positive.

In the NDVI time series analysis, SWADE can segment time series into different periods of vegetation growth as well as detecting the break points indicating potential landslide occurrence. In order to segment the time series, three approaches are summarized by Keogh et al. (2001): top-down, bottom-up and sliding window algorithm. In the top-down algorithm, the curve is partitioned into a number of pieces iteratively until a stop criterion is met. In the bottom-up algorithm, from the finest pieces, the segments will merge iteratively into larger pieces until a stop criterion is met. In the sliding window algorithm, the segment grows in size from the left to the right, until a stop criterion is met. In order to extract only one or two points from the whole NDVI time series, the top-down algorithm is preferred in our study due to its high efficiency.

Figure 2a summarizes the segmenting procedures for SWADE. In our study, we combine a decision tree method (Rivest 1987) with the stepwise linear fitting method to detect the probable date range of landslide occurrence through Python. The decision tree method is subdividing the entire time series into different segments based on the CDNDVI curve slope. We define a minimum number of three segments, but typically more segments are identified. Then, we use stepwise linear fitting to fit each of those segments through linear regression. Hence, the CDNDVI_w series is segmented into a number of potential pre- and post-CDNDVI_w segments. The most probable time of landslide occurrence (SWADE-a) is then defined as the largest slope, and the second most probable time of landslide occurrence (SWADE-b) as the second largest slope.

Harmonic modelling

The harmonic modelling method for landslide dating published by Deijns et al. (2020) is a semi-automatic technique which integrates the actual NDVI value and the NDVI harmonic fit value to detect when the landslide occurred. The harmonic modelling method consists of three main steps: (1) data collection, (2) harmonic fitting and (3) semi-automatic detecting.

In the data collection of non-detrended harmonic modelling NDVIL (landslide NDVI, in Fig. 2b) is collected in the same way as for SWADE (Fig. 2a). The CDNDVIL is calculated as the cumulative difference between NDVIL and its harmonic fitted NDVILfit. In the non-detrended harmonic modelling, the CDNDVIL is used to determine landslide dating (Eq. (2)).

$$CDNDV{I}_{L}={\sum }_{t=1}^{t=T}{(NDV{I}_{Lfit}-NDV{I}_{L})}_{t}$$
(2)

The detrended harmonic modelling approach implements the CDNDVI of randomly located forest patches within the study area (CDNDVIF) (Fig. 2b). The CDNDVIF is calculated as the cumulative difference between NDVIF and its harmonic fitted NDVIFfit (Eq. (3)). The CDNDVI in Eq. (4) is then used to date the landslide.

$${CDNDVI}_{F}={\sum }_{t=1}^{t=T}{(NDV{I}_{Ffit}-NDV{I}_{F})}_{t}$$
(3)
$$CDNDVI={\sum }_{t=1}^{t=T}{(CDNDV{I}_{L}-CDNDV{I}_{F})}_{t}$$
(4)

At last, for the semi-automatic landslide dating, a peak finding algorithm is applied to detect the change points, which represent probable date ranges of landslide occurrence. When multiple peaks are detected, the most likely landslide date is manually picked based on expert judgement.

LandTrendr

The LandTrendr algorithm was originally developed from a trajectory-based method for monitoring deforestation and logging activities in the USA (Kennedy et al. 2007). LandTrendr has been used for various applications, i.e. forest disturbance or quantification of biomass, investigating changes at the earth surface in time series of satellite images. Examples are published by Pflugmacher et al. (2014), Kennedy et al. (2018) and De Jong et al. (2021). LandTrendr detects changes in NDVI on an annual basis, which may also be suitable for landslide detection and dating in vegetated areas. Figure 2c shows the conceptual approach of LandTrendr. LandTrendr comprises six main steps (Kennedy et al. 2018): despiking, identifying probable vertices, culling by angle change, identifying best path for maxSegments, creating successively simpler models and picking the best model.

In the LandTrendr method, the assignment of eight parameters is quite challenging. Here, we summarized the LandTrendr parameters among 19 publications related to the application of land cover and land use around the world (Yang et al. 2018; Kennedy et al. 2018; Xu et al. 2019; Hislop et al. 2019; Hurtado and Lizarazo 2019; Filippelli et al. 2020; Mugiraneza et al. 2020; Xiao et al. 2020b; Giannetti et al. 2020; De Jong et al. 2021; Matsala et al. 2021; Ni et al. 2021; Powers 2021; Rodman et al. 2021; Gomez 2021; Kolecka 2021; Komba et al. 2021; Long et al. 2021; Runge et al. 2022). From this literature overview two main points are concluded: (1) five out of the eight parameters do not vary substantially among those applications when comparing to the default parameters (Kennedy et al. 2018), e.g. spikeThreshold, vertexCountOvershoot, pvalThreshold, bestModelProportion and minObservationNeeded. (2) The remaining three parameters are adjusted as a function of the aim of each case study, e.g. maxSegments, preventOneYearRecovery and recoveryThreshold.

The maxSegments is defined as the maximum number of segments during the LandTrendr fitting procedure. Besides the amount of landslides happening at the same area in different years, the number of segments also indicates vegetation disturbance caused by other processes than landslides. Therefore, we tested different maxSegments from 6 to 10, and found the best results for a maxSegments of 9. The recoveryThreshold is calculated as 1/recovery years, related to the vegetation recovery samples. The disturbance by landslides often damaged the forest growth more intensely than other indicators, e.g. wildfire and harvest. We therefore use a landslide recovery of one year as the reference of recoveryThreshold and the decision of preventing 1 year recovery to “false”.

The image collection calculated in LandTrendr has the same preprocessing (criteria 1 and 2) as SWADE. The parameter set used in this study is listed as maxSegments (9), spikeThreshold (0.9), vertexCountOvershoot (3), preventOneYearRecovery (false), recoveryThreshold (1.0), pvalThreshold (0.05), bestModelProportion (0.75) and minObservationsNeeded (6). The concept of other LandTrendr parameters and functions can be found in Kennedy et al. (2018).

Quantification of landslide dating precisions

Our validation database consists of 66 landslides that have occurred between 1985 and 2017 in the Buckinghorse River area, Canada. The dates of these landslides have been manually determined based on Landsat, Sentinel-2 and lidar imagery by Deijns et al. (2020). As a result, the timing of the landslides used for validating our results is a time range spanning from the nearest pre- and post-landslide image. Similarly, time of landslide occurrence by the (semi-)automatic methods evaluated here is also given by a range spanning from the nearest pre- and post-landslide image (Fig. 4).

Fig. 4
figure 4

Accuracy and uncertainty assessment for landslide dating methods. Three situations can occur: a separation, b interaction, c integration. The manually detected date range of landslide occurrence is shown by the blue box and the (semi-)automatically estimated date range of landslide occurrence by the red box. The accuracy is shown by the mean (black curly brackets; referred to as time lag following Reiche et al. (2018)). The uncertainties are shown by the minimum (green curly brackets) and maximum (grey curly brackets) time difference between the manual landslide timing results and estimated landslide timing results

Figure 4 illustrates how we determined and defined accuracies and uncertainties of the landslide dating. We slightly modified the approach proposed by Reiche et al. (2018), and define the dating accuracy as the time difference between the middle date of the manually detected date range and the middle date of the date range estimated by the (semi-)automatic dating methods (mean time lag). Similarly, we define the uncertainty range as the minimum to maximum time difference between the manually and (semi-)automatically detected time ranges.

Results

In this section, we first compare the accuracy of the three landslide dating approaches: SWADE, harmonic modelling, and LandTrendr (“Accuracy of the landslide dating approaches” section). Then, we present the effect of landslide area on detection accuracy (“Effect of landslide area on dating accuracy” section) as well as the effect of landslide occurrence year on detection accuracy (“Effect of landslide occurrence year on dating accuracy” section). Next, to further illustrate the concept of landslide detection using SWADE, we present and discuss the detection of three representative example landslides in detail (“Detailed examples of landslide dating with SWADE” section).

Accuracy of the landslide dating approaches

Table 2 (in the Appendix) shows the landslide inventory and the landslide dating results in the Buckinghorse River area, Canada. The minimum, mean and maximum time lags have been analysed to compare the different landslide dating results. The accuracies of landslide dating are reported in days except for the LandTrendr method, where we use the unit of year because this method compares land cover changes at a yearly basis. The results of the non-detrended harmonic and detrended harmonic methods are taken from Deijns et al. (2020).

To distinguish the variety of all landslide dating results, the accuracy is constrained into different categories, e.g. 0–30, 0–180, 0–365, 0–730 and 0–1472 days. Figure 5 summarizes the percentages of dated landslides among different accuracy categories. SWADE2 and detrended harmonic modelling results have the highest landslide dating accuracy in different accuracy categories indicated as dark blue and dark brown in Fig. 5.

SWADE1, identifying the most probable landslide date, has accuracies of 11%, 41%, 52%, 62% and 71% in the 0–30-, 0–180-, 0–365-, 0–730- and 0–1472-day accuracy categories, respectively. Due to the time difference of manually pre- and post-landslide images and the temporal resolution of satellites, the uncertainty has been evaluated by the minimum and maximum time lag in the landslide dating results. The accuracies of minimum time lag reached 24%, 44%, 53%, 64% and 71% in the 0–30-, 0–180-, 0–365-, 0–730- and 0–1472-day accuracy categories, respectively. And the accuracies of maximum time lag were 3%, 27%, 48%, 58% and 71% in the 0–30-, 0–180-, 0–365-, 0–730- and 0–1472-day accuracy categories, respectively.

SWADE2, with the two most probable landslide dating results, has a higher accuracy than SWADE1 (Appendix Table 2 and Fig. 5), which logically follows from the fact that SWADE2 provides the two most probable landslide dates. SWADE2 yields accuracies of 17%, 58%, 68%, 80% and 91% of the landslides within the categories of 0–30, 0–180, 0–365, 0–730 and 0–1472 days, respectively. The accuracies of minimum time lag reaches 37%, 61%, 70%, 83% and 92% within the 0–30-, 0–180-, 0–365-, 0–730- and 0–1472-day accuracy categories, respectively.

The detrended and non-detrended harmonic modelling results have similar accuracies compared to SWADE2 and SWADE1, respectively. The detrended harmonic modelling overall leads to higher accuracies than non-detrended harmonic modelling. Compared to the SWADE and harmonic modelling approaches, LandTrendr is not successful or suitable for landslide dating. LandTrendr is based on comparison of yearly averaged NDVI maps, and consequently its accuracy is set to zero in the accuracy categories of 0–30 days and 0–180 days. The percentages of dated landslides of LandTrendr are 17%, 24% and 42% in the 0–365-, 0–730- and 0–1472-day accuracy categories, respectively.

Fig. 5
figure 5

The percentages of successfully dated landslides among different accuracy categories. The 0–1472-day accuracy category is highlighted here, as being consistent to the calculated maximum accuracy by Deijns et al. (2020). Error bars are constrained by the maximum time lag (low points) and the minimum time lag (high points)

Effect of landslide area on dating accuracy

Figure 6 shows that the landslide area affects the landslide dating accuracy in the Buckinghorse River area, Canada. The percentage of dated landslides in Fig. 6 has been calculated according to the successfully dated landslides in the corresponding area class (coloured points and lines in Fig. 6) and the sum of landslides in the corresponding area class (grey bar in Fig. 6). For the smaller landslides (≤ 3 × 104 m2), the percentage of dated landslides is generally lower than in the large size classes. This trend is particularly evident for the most accurate dating category of 0–30 days, and also mostly evident for the SWADE methods which seem more sensitive to landslide area in this accuracy category than the other methods. The dependency of detection accuracy on landslide area is likely explained by the limited number of pixels covered by small landslides on the moderate spatial resolution of the Landsat images, which limits the success rate of detection.

Fig. 6
figure 6

The percentage distribution of dated landslides in different ranges of landslide area within a 0–30-day, b 0–180-day, c 0–365-day and d 0–730-day accuracy categories. The grey bar and the right y-axis indicates the number of landslides in the corresponding area range

For LandTrendr, the landslide area does not vary significantly when comparing to other landslide dating results. The percentage remains stable among different landslide area ranges within the accuracy category of 0–365 days (Fig. 6c); but within the accuracy category of 0–730 days (Fig. 6d), there is significant increasing with landslide area from 6–10 × 104 m2 to > 105 m2.

Effect of landslide occurrence year on dating accuracy

The number of landslides in the study area is relatively constant, except for an increase in the year range 2015–2017 because a large number of 19 landslides occurred in 2016 (grey bars in Fig. 7). The number of available Landsat images increases from 1985 to 2017, increasing from 20 to 60 images per year (blue bars shown in Fig. 7). These increases mostly result from the launch of Landsat-7 and Landsat-8, making new images available after July of 1999 and April of 2013 in the study area, respectively. This increase in images affects the accuracy of landslide dating.

For the year range of 2000–2014, the availability of more satellite images enabled the landslide dating accuracies to increase for the SWADE and harmonic modelling approaches, in particular for accuracy categories of 0–180, 0–365 and 0–730 days. However, the percentages of successfully dated landslides in the year range 2015–2017 is relatively low, which is because most landslides in the year range 2015–2017 happened in August 2016, which lacked ample post-landslide images for accurate detection.

For the year ranges 1990–1994 and 1995–1999, the numbers of available satellite images are similar while the percentages of successfully dated landslides are relatively different among different accuracy categories. In the 0–30-day accuracy, the percentage of dated landslide of year range of 1995–1999 is lower than that of 1990–1994. This is because eight out of twelve landslides in the year range of 1995–1999 spanned the winter season in the manual detection, leading to a relatively low accuracy. The percentages of successfully dated landslides in the year range of 1995–1999 increase rapidly with the relaxation of accuracy categories. This is mainly because the characteristics of the landslides in this year range are: a rapid NDVI decreasing response due to the landslide and a long NDVI recovery period. Hence, these landslides are easily detected by the SWADE and harmonic modelling approaches.

Fig. 7
figure 7

The percentage distribution of dated landslides in different ranges of landslide occurrence year within a 0–30-day, b 0–180-day, c 0–365-day and d 0–730-day accuracy categories. The blue bars indicate the number of satellite images per year in the corresponding year range and the grey bars indicate the number of landslides in the corresponding year range

Detailed examples of landslide dating with SWADE

In this section, to further illustrate the principle of landslide detection based on Landsat imagery using SWADE, we show the dating of three selected landslides in detail (Fig. 8). These examples follow the three detection situations as illustrated in the flow diagram of Fig. 3.

  1. 1.

    Situation I, landslide LS28 (Fig. 8a). Before the landslide occurs, the CDNDVI_w is a horizontal line. The CDNDVI_w of Landslide LS28 remains roughly constant until 1995–10-18 (pre-SWADE-a, the most probable pre-landslide date range). The NDVI values at the landslide location (NDVIL line) and for the forest (NDVIF line) were in that time period noticeably similar, meaning that before the landslide occurred (1997–08-04), the vegetation growth rate in the undisturbed forest zone was similar to that in the pre-landslide zone. After the landslide event took place, the CDNDVI_w values increased significantly as a result of vegetation removal by this landslide. The accuracy of landslide LS28 by SWADE1 is moderate with an accuracy of 572 days in Fig. 8a. For this landslide, the accuracy of SWADE1 is higher than that of non-detrended harmonic (708 days; 1995–08-31) and LandTrendr approach (2 years; year 1995), but lower than that of detrended harmonic (476 days; 1996–04-20).

  2. 2.

    Situation II, landslide LS32 (Fig. 8b). Before the landslide occurs, the temporal development of CDNDVI_w is inclined upwards. The CDNDVI_w was linearly increasing before August, 2000, because the NDVI values at the forest location (NDVIF line) were higher than the NDVI values at the landslide location (NDVIL line). After landsliding, the CDNDVI_w increased significantly, again as a result of vegetation removal. The dating of landslide LS32 was precisely detected with an accuracy of 57 days by SWADE. For this landslide, the accuracy of SWADE is higher than that of the non-detrended harmonic and detrended harmonic (204 days; 2001–03-09) and LandTrendr (< 1 year; Year 2000).

  3. 3.

    Situation III, landslide LS55 (Fig. 8c). Before the landslide occurs, the CDNDVI_w decreases. Several years prior to the landslide, the NDVI values at the forest location (NDVIF line) were mostly lower than at the landslide location (NDVIL line). After landslide LS55 occurred, CDNDVI_w increased significantly as a result of vegetation removal by the landslide. The accuracy of landslide LS55 by SWADE1 is 724 days, and lower than the accuracy of the non-detrended harmonic (80 days; 2016–06-22) and detrended harmonic (223 days; 2017–04-21), but higher than that of LandTrendr (2 years; Year 2014).

Fig. 8
figure 8

Examples of landslide dating with SWADE compared with the other methods for landslides LS28, LS32 and LS55 shown in Appendix Table 2. The y-axis from left to right presents the temporal development of Landslide NDVI, Forest NDVI and Cumulative NDVI difference (CDNDVI_w). The vertical black line shows the actual date range of the landslide. The dotted black line and dotted dark brown line show the time series of Landslide NDVI and Forest NDVI, respectively. The solid blue line shows the value of CDNDVI_w. The light green bar shows the dating result year from LandTrendr. The six coloured shapes represent the actual date and five results derived from five landslide dating approaches, and all represents the actual or estimated pre-image dates

Discussion

Landslide activity in vegetation-covered zones leads to partial or complete vegetation removal. This results in a sudden NDVI decrease that can be observed in time series of satellite imagery. Landslides can therefore be dated by detecting these NDVI drops in the available satellite images in the archives (e.g. Coe et al. 2018; Dewitte et al. 2021). We have used this concept to develop a novel automatic method (SWADE) for landslide dating by analysing the Landsat images archived in Google Earth Engine.

Compared to existing and previously published semi-automatic landslide dating methods (Kennedy et al. 2018; Deijns et al. 2020), SWADE not only enables automatic landslide dating, but does also not require expert judgement. In contrast to the harmonic modelling and LandTrendr approach, SWADE is applicable in regions where a well-developed seasonal vegetation cycle is absent. In addition, SWADE can be utilized to detect other geomorphic hazards if this hazard event causes damage to the vegetation. The harmonic modelling approach (Deijns et al. 2020) typically detects multiple potential dates of landslide occurrence, after which expert judgement is required to define the most-likely landslide date. SWADE, on the other hand, generates the date of landslide occurrence through segmenting principles in an automatic way, providing the two most probable results to the land use managers. LandTrendr (Kennedy et al. 2018), originally developed for landslide dating on an annual basis, results in comparatively low accuracies and is not suitable for dating the landslides in the Buckinghorse River area, Canada.

Applicability of SWADE beyond Buckinghorse River area

We expect that SWADE can be applied for landslide detection in contrasting vegetated areas worldwide, regardless of vegetation types (e.g. forest, grassland, tundra) or seasonal vegetation cycles. In this study, SWADE is applied to a region in the temperate / continental climates of British Columbia characterized by a strong harmonic, sinusoidal cycle of the vegetation and hence of the NDVI. Importantly, SWADE has the potential to be successfully applied in areas where less clear seasonal cycles exist, such as the tropics, in contrast to harmonic modelling approaches (Deijns et al. 2020). In bare areas, SWADE is not applicable for landslide dating, and other landslide dating approaches should be developed and used.

SWADE could be used as a tool for constructing landslide magnitude-frequency relations in remote areas (Guzzetti et al. 2005; Fu et al. 2020). In the past decades, much work has focused on spatial landslide detection through remote sensing. Using the spectral, spatial, shape or contextual signals landslide borders can be produced through single or combined image analysis techniques, for example, object-oriented methods, change detection methods and machine learning methods (Nichol and Wong 2005; Borghuis et al. 2007; Martha et al. 2010; Stumpf and Kerle 2011; Hölbling et al. 2012; Qi et al. 2020; Amatya et al. 2021). To date these landslides, SWADE could serve as a useful tool providing accurate temporal information on landslide occurrence. Hence, SWADE has the potential to be implemented in a (regional) spatio-temporal landslide detection framework, for the automatic construction of magnitude-frequency distributions.

Factors affecting dating accuracy

Similar to any other remote sensing approaches, the accuracy of SWADE is affected by data quality and image availability. The time range of our study is from 1985 to 2017, as constrained by Landsat 5 image availability. At the start or end of the available satellite image time series, the landslide dating accuracy is lower than the accuracy in the middle of the time range, as was also concluded by Deijns et al. (2020) for the harmonic modelling approaches. The reason is that a break in slope for the CDNDVI can only be successfully detected when there is a sufficiently long CDNDVI curve pre- and post-landslide (Figs. 3 and 8). An example is shown for landslide LS65 (Fig. 9) where the landslide occurs at the end of the available time series and the change in NDVI cannot be detected.

Fig. 9
figure 9

Landslide dating result for landslide LS65. This landslide takes place at the end of the available time series of images in July 2017 and consequently the drop in NDVI cannot be detected and accurate dating is hampered. A longer time series is required. See Fig. 8 for full legend

Forest removal by landslides, and hence a sudden drop in NDVI values, forms the basis for our landslide dating approach. When analysing NDVI time series, the vegetation regrowth after a landslide is fundamental to the landslide dating results. Our study assumed that the remaining vegetation after a landslide will recover to the same or similar vegetation type in the pre-landslide period. However, the vegetation cluster type varies drastically along the forest evolution in Canada (Geertsema et al. 2009). Two situations about plant succession are listed. (1) The colonization of Senecio and cotton grasses, then of grasses and horsetails and forb occurred after tundra mudflows (Lambert 1972). (2) After landsliding in British Columbia, typically first deciduous trees developed that are later replaced by spruce forest (Smith et al. 1986). Based on these two situations a likely scenario of the succession in the study area might be “spruce forest—landslide—forbs—shrubs—deciduous trees—spruce forest” in the long run. Differences in vegetation characteristics between the pre- and post-landslide may affect the NDVI values and therefore have an impact on date detection accuracy.

Another complicating factor is the degree of removal of the forest cover by the landslides. In large landslides, there might be a spatial heterogeneity in vegetation cover: translated rafts of vegetation exist in the dataset of our case study. In addition, the vegetation damage depends to some extent on the rate of movement as well (Geertsema et al. 2009). Generally, higher velocity and higher energy landslides may displace more vegetation and have more exposed soil. Therefore, high-energy landslides are likely more accurately detected by (semi-)automatic dating methods.

It is challenging for the here evaluated methods to date complex, overlapping or re-activated landslides. For example, landslides LS20 and LS28 happened in the years 1995–1996 and 1997, respectively, and partly overlap. The additional vegetation disturbance of LS28 hindered the SWADE method to segment the NDVI time series into the pre- and post-landslide segments. Both SWADE and harmonic modelling could only detect the strongest NDVI decrease during the formation of LS20, leading to the false negative detection of LS28.

The images in the optical satellites are often contaminated by clouds, cirrus and cloud shadows at the moments of landslide occurrences because landslides are often triggered by heavy precipitation events (Sassa et al. 2009, 2015). Although cloud masking is applied here before imagery is used as input for SWADE, cloud contamination remains a problem. A promising alternative is the use of microwave (SAR) images as these images are almost unaffected by weather conditions (Henderson and Lewis 1998).

Conclusions

We present an innovative automatic landslide dating approach using time series of Landsat imagery. This new method, SWADE (Segmented WAvelet-DEnoising and Stepwise linear fitting), requires long time series image collections, eliminates cloud contamination, fills data gaps, and minimizes seasonal changes. SWADE applies wavelet transformation to remove noise, and combines a decision tree with stepwise linear fitting to date landslide occurrence. The accuracy of SWADE is evaluated using 66 landslides in the Buckinghorse River area, northeastern British Columbia, Canada, and compared to the previously published semi-automatic harmonic modelling and LandTrendr methods.

The SWADE results are promising for dating of landslides in vegetated and temperate-climate regions. SWADE detects 52% of the landslides within a maximum error of 1 year, and 62% of the landslides within a maximum error of 2 years, when considering the most-probable landslide date range identified by SWADE. When considering the two most-probable landslide date range, these numbers increase to 68% and 80%, respectively. By comparison, harmonic modelling detects 79% of the landslides with a maximum error of 1 year, and 82% of the landslides with a maximum error of 2 years, but requires expert judgement making this model more subjective and time-intensive. LandTrendr, originally developed for mapping deforestation, is poorly applicable to landslide detection having an accuracy of 42% within a maximum error of 2 years.

SWADE might also be applied in areas where less clear seasonal cycles exist, in contrast to harmonic approaches, and feasible to other geomorphic hazards if the occurrence is followed by forest removal. The new SWADE method provides a promising automatic method for landslide dating in vegetated remote areas, which may substantially contribute to landslide magnitude-frequency and hazard assessment.

The landslide temporal detection may be further improved by (i) combining with multi-source optical remote sensing images (e.g. Sentinel-2 and Landsat 9); (ii) combining with cloud-independent SAR imagery (e.g. Sentinel-1); and (iii) analysing and comparing the vegetation disturbance between landslides and wildfires.