Introduction

Landslides are among the most dangerous and complicated natural hazards, resulting in severe destruction, natural resource damage, and human life and property (Li et al. 2020; Mondini et al. 2021). Landslides usually occur in different types (e.g., debris flow and rockfall), frequencies, and intensities worldwide (Mondini et al. 2011). Therefore, it is critical to study and analyze this severe natural hazard and provide susceptibility modeling and mappings to prevent and mitigate its calamitous consequences (Catani 2021; Hua et al. 2021; Piralilou et al. 2021). Such analysis and modeling need precise landslide inventory maps (Huang et al. 2020; Thi Ngo et al. 2021). Moreover, rapid mapping of landslides following heavy rainfalls or major seismic events is essential for quick responses, humanitarian aid, and other disaster mitigation (Piralilou et al. 2021, Pourghasemi and Rahmati 2018).

Satellite imagery is considered the primary available data source for detecting landslides and updating inventory maps. Obtaining information from satellite imagery is primarily carried out through two approaches, pixel-based and object-based image analysis (OBIA) (Chen et al. 2018). The pixel-based approaches such as maximum likelihood, minimum distance, and parallelepiped are the conventional and widely used models for image classification and natural hazard inventory mapping like landslides (Nichol and Wong 2005). However, these models fail for images with intricate textures and intensive spectral heterogeneity (Goetz et al. 2015). Thus, machine learning (ML) models such as decision trees, support vector machines (SVMs), artificial neural networks (ANN), and random forest (RF) are applied to address these issues (Hölbling et al. 2012). Several studies have reported that those ML models presented higher transferability and accuracy in image classification and object detection than conventional models like maximum likelihood (Peña et al. 2014). On the one hand, despite the different performances of ML models, the pixel-based approaches have some deficiencies, especially when dealing with very high resolution (VHR) satellite imagery (Chen et al. 2018). Due to the rich information content of VHR imagery, problems such as “salt and pepper” appear, adversely affecting image classification and landslide detection results (Hölbling et al. 2012).

On the other hand, due to limitations of pixel-based approaches in image processing, especially in classification, OBIA has been widely used to overcome weaknesses related to per-pixel analysis, mainly in high-spatial-resolution and VHR remote-sensing imagery (Blaschke et al. 2014). OBIA applies specific approaches to image analysis, including satellite image analysis that studies feature entities and phenomena by analyzing image objects derived from a segmentation process rather than a single pixel value (Chen et al. 2018). OBIA is a knowledge-driven method that mimics human perception by grouping a set of pixels into meaningful units that represent corresponding features in the real world (Blaschke 2010). Compared to conventional pixel-based approaches, which depend on the digital number (DN) of individual pixels, OBIA integrates and employs spectral information (e.g., color) and spatial properties (e.g., size, shape), together with textural data and contextual information (e.g., association with neighboring objects) (Blaschke et al. 2014). Although OBIA approaches have solved some issues related to per-pixel image classification, achieving higher accuracy using ML models in complex tasks entails several challenges. For example, the optimal scale parameter for object definition for geographic features (e.g., landslides) is often represented in multiple scales within the extent of satellite imagery (Tavakkoli Piralilou et al. 2019).

Due to recent developments in the field of computer vision and graphics processing units (GPU), some deep learning (DL) models such as the convolutional neural network (CNN) and fully convolutional network (FCN) have been developed, reaching state-of-the-art accuracy in satellite image classification (Längkvist et al. 2016; Mahdianpari et al. 2018), and object detection (Radovic et al. 2017) in comparison to conventional pixel-based approaches (Feizizadeh et al. 2021; Ghaffarian and Kerle 2019; Panahi et al. 2020). Such DL models have recently been employed in landslide detection in natural hazard monitoring and modeling. Ghorbanzadeh et al. (2019) (Ghorbanzadeh et al. 2019) have shown the higher potential of CNNs over ML models of ANN, RF, and SVM in landslide detection from VHR satellite imagery. Both DL and ML models were trained and tested based on optical data from the RapidEye satellite and topographic factors, and their highest f1-score and mIOU values were 87.8% and 78.26%, respectively. The CNN architectures applied in this study were also used effectively by Sameen and Pradhan (2019) (Sameen and Pradhan 2019) and similar studies, for landslide detection in Malaysia and India. However, when Sameen and Pradhan (2019) compared the results of the applied CNN with those of a residual network model (ResNets), they achieved the highest f1-score of 90% by the ResNets, which was higher than that of the CNN’s f1-score of 83%. This trend of applying CNN to landslide detection was followed by using FCN models such as the U-Net. Soares et al. (2020) and Bragagnolo et al. (2021) used the U-Net model for landslide detection in the mountainous region of Rio de Janeiro, Brazil, and the Himalayan region, Nepal, with the highest f1-score values of 55% and 67%, respectively. In another study, Liu et al. (2020) used a U-Net and a residual U-Net (ResU-Net) trained and tested on images with different spatial resolutions from Northern Sichuan Province. Their highest f1-score and mIOU values were more than 83% and 76% for the U-Net, respectively, and 87% and 93% for the ResU-Net model. A comparison of the U-Net and the ResU-Net models also was made by Qi et al. (2020) and Ghorbanzadeh et al. (2021) for detecting landslides in different case study areas, and the ResU-Net usually performed better than the U-Net.

The application of DL models for landslide detection is also well described in some current studies (Cai et al. 2021; Huang et al. 2020; Su et al. 2021; Tang et al. 2021; Wang et al. 2021); the results indicate that these models present promising performance and accuracy in landslide inventory mapping.

Although DL models and CNNs, in particular, have demonstrated the high capability of feature learning (low-, mid-, and high-level information), in some scenarios, they are unable to resolve issues in image classification that can arise from the similarity between the targets of interest and other features (Mboga et al. 2019). Therefore, using knowledge-based approaches to deal with such problems is essential. Moreover, image classification using OBIA preserves precise information from object borders and edges, unlike pixel-based methods (Majd et al. 2019).

This study aims to improve the landslide detection map generated by a well-known FCN model of the ResU-Net using prior knowledge in the context of OBIA. Thus, the integration of the OBIA with an FCN model is the novel characteristic of this study compared to the state-of-the-art works cited in the review of landslide detection literature. Although there have been some previous efforts that combined CNNs with OBIA, e.g., for refugee camp classification (Ghorbanzadeh et al. 2020) and rock glacier mapping (Robson et al. 2020), this is the first study that evaluates the feasibility of the integration framework of an FCN model with a rule-based OBIA for landslide detection. Also, in contrast to the studies mentioned above that used a heatmap in the segmentation process, here we used the layers of the original image to reduce the bias of the heatmap in the final classification results. Therefore, the heatmap is only used to classify objects based on their probability value as well as some spectral, topographical, and spatial characteristics. Our proposed framework aims to refine the landslide detection map resulting from the ResU-Net using prior knowledge in addition to the data from the original image and the resulting probabilities from the ResU-Net.

The remainder of the study is organized as follows: “Study area and data used” presents the study area and data used, “Method” describes the methodology of the integrated framework, and results and accuracy assessment are presented in “Method”, and “Results and accuracy assessment”, followed by a brief discussion and conclusion in “Discussion” and “Conclusions”, respectively.

Study area and data used

In early August 2009, a destructive typhoon called Morakot, the deadliest typhoon in Taiwan’s recorded history, hit Western Taitung County in Taiwan. It brought around 2884 mm of precipitation into the region during 5 days, which led to severe flooding. As a result, nearly 652 people died, and 47 were missing, and there was damage estimated at over 3 billion USD to personal property and infrastructure (Lin et al. 2011). The typhoon also triggered nearly 22,700 landslides across the region, an area of 274 km2. The majority of landslides were shallow, while a few deep-seated landslides occurred in mountainous areas of the southern part of Taiwan (Lin et al. 2011, 2013). The study area’s geological structures are complicated, and there are many faults and fold systems. The lithology of our case study area comprises metamorphic rocks, sedimentary rocks, and terrace deposits.

The main directions of the fault systems are north–south and east–west (Nguyen and Liu 2019). The extent of the study area was selected to characterize different geographical features, including rivers, lakeshore, non-vegetated regions, and rocky terrains. In our study site, presented in Fig. 1, the total number of landslides is 895, with a total area of 31.33 km2. Images used in this study for landslide mapping are from Sentinel-2 level-1C products accessed on the Google Earth Engine (GEE) website. Since GEE only provides Level-1C products of Sentinel 2 data, which does not include atmospheric correction, we used the Sen2Cor plugin (Main-Knorn et al. 2017) in SNAP software to apply atmospheric corrections.

Fig. 1
figure 1

Sentinel-2A false-color composite of the study area indicating the training area, testing area, and landslide inventory map

There was no reliable landslide inventory map for our study area. Therefore, we manually digitized landslides within the study area based on Sentinel-2 imagery acquired before July 21, 2016, and corrections were made using Google Earth’s archive imagery after the Morakot typhoon. The slope layer was generated from the ALOS digital elevation model (DEM) (12.5 m) produced by the ALOS sensor of the Japanese aerospace exploration agency (JAXA), available from https://search.asf.alaska.edu/#/.

Method

In this section, we explore the improvement of the results of the well-known FCN model of the ResU-Net using OBIA. The designed rulesets of OBIA are used to contribute prior knowledge to the landslide segmentation by the ResU-Net. The main steps of this paper are represented in Fig. 2 and can be summarized as follows:

  • Stack the layers of the acquired bands of 2, 3, 4, and 8 of Sentinel-2 and generate a slope layer from ALOS DEM for training and testing areas.

  • Structure, train, and test the ResU-Net based on five stacked layers and the inventory data.

  • Develop rule-based OBIA based on the data from original images.

  • Use the same rule-based OBIA for the resulting probabilities from the ResU-Net plus with data from original images.

Fig. 2
figure 2

Methods used in this study

Residual U-Net (ResU-Net)

In this study, a ResU-Net is designed to combine the strengths of both the FCN model and the high learning performance of the residual neural network (Qi et al. 2020; Zhang et al. 2018). The residual learning blocks can improve the training process of the U-Net, and the skip connections in a residual learning block and between the encoding (downsampling) path and the decoding (upsampling) path can ease information propagation deprived of degradation (Mohammadimanesh et al. 2019; Ronneberger et al. 2015; Zhang et al. 2018). The input of the ResU-Net is the stacked five layers of four Sentinel-2 and one slope layer, with a total size of 128 × 128 × 5, and the output of the ResU-Net is the detected landslide with the size 128 × 128. This window size was selected based on a previous study that used the same case study area, in which its landslide detection performance was greater than that of other window sizes (Ghorbanzadeh et al. 2021). Similar to the standard U-Net, the general structure of the ResU-Net includes an encoding (downsampling) path and a decoding (upsampling) path for capturing low-level and high-level representations, respectively. The downsampling path of our ResU-Net comprises three levels, each including a residual learning block that consists of two convolution layers with a filter size of 3 × 3. Each has a batch normalization layer and a ReLU activation layer in advance (Liu et al. 2020; Zhang et al. 2018). The first convolution layer in each residual learning block uses a stride of 2 instead of a pooling operation for downsampling. The input information (xi) of each residual learning block is added to its output (F(xi) + (xi)) by an identity mapping (see Fig. 3).

Fig. 3
figure 3

Display of the residual learning block with identity mapping used in the ResU-Net. BN refers to batch normalization

After the third level in the downsampling path, one level connects from this path to the upsampling path. The upsampling path also consists of three levels. However, an upsampling before each residual learning block is used in the upsampling path, along with a concatenation of the feature maps resulting from the corresponding downsampling path. The Adam optimization algorithm (Kingma and Ba 2014) and the binary cross-entropy (Ronneberger et al. 2015) were used as the model optimizer and loss function. Thus, The DL model was trained via backpropagation through mini-batch stochastic training, binary cross-entropy loss function, and the Adam optimizer. We leveraged the early-stopping strategy to avoid overfitting problems, saving the optimized weights in correspondence to the best-achieved performance on the validation set. Finally, a convolutional layer with a filter size of 1 × 1 and a sigmoid activation function are used on top of the network to project the resulting multichannel feature maps of the last level into the landslide/non-landslide labels.

The overall network structure of the ResU-Net is shown in Table 1. The experimental setup of the DL model (implementation, training, and evaluation) was carried out in a Python 3.6 environment using the TensorFlow2 library with the Keras API.

Table 1 The network structure of the ResU-Net applied in this study

Knowledge-based OBIA

Compared to the pixel-based methods, OBIA allows users to create and use more features, such as spectral information, textural, geometrical (such as size and shape), and topological relationships, among the generated objects for image classification (Majd et al. 2019). This study uses the object’s spectral information and the geometrical properties in our applied OBIA approach. In complex tasks such as landslide detection, employing such features through the knowledge-based rulesets can improve and refine results generated by ML and DL models. In this regard, we aimed to improve the landslide inventory map generated by ResU-Net using such object features and hierarchical knowledge-based rulesets to limit omission and commission errors in detection procedures. With the multi-resolution segmentation method implemented in eCognition (www.ecognition.com), Sentinel-2 imagery with 10-m resolution was segmented into homogeneous objects using various scales. Estimating the optimal segmentation parameters to create the most representative objects is challenging, and they are usually obtained using trial and error (Karantanellis et al. 2021). The segmentation process using scale 55, shape 0.7, and compactness 0.3 provided image objects with the lowest under-segmentation error based on visual inspection. During the segmentation process, only five stacked layers from the Sentinel-2 imagery and slope feature were involved. Other features, such as the landslide probability map generated by the ResU-net and normalized difference vegetation index (NDVI) (Eq. 1), were later used in rulesets.

$$\mathrm{NDVI} = (\mathrm{NIR}-\mathrm{Red})/(\mathrm{NIR}+\mathrm{Red})$$
(1)

where NIR and Red are the near-infrared and red bands of the electromagnetic spectrum of Sentinel-2 imagery. Since the produced image object was mainly associated with the over-segmentation error, we applied a spectral difference algorithm to merge objects with a maximum of 10% dissimilarity in their spectral values, resulting in objects with a better representation of landslides’ spectrally similar neighbors, such as riverbeds. In landslide-affected areas, the NDVI values tend to be around zero or negative. Thus, all objects with NDVI values less than 0.1 were extracted from all selected objects with slopes more than 15° in the first ruleset. The output of each ruleset was used as an input for a second ruleset. In this second ruleset, a landslide probability map (with values ranging between 0 and 1) was added to select objects that had a probability of more than 0.5 of being landslide objects. However, some riverbeds were still detected as landslides after applying such a ruleset. Since they had negative NDVI values and slopes of more than 17°, their probability values in the U-net map were more than 0.6. To solve this issue, in the context of OBIA objects, a spatial and geometric feature of the length-to-width ratio (L2W) is used for the segmented objects as the landslide so that such errors can be removed. The L2W is calculated based on the covariance matrix (Eq. 2) of the pixels’ coordinates that shaped the object’s boundary.

$$S = \left[\begin{array}{cc}\mathrm{Var}(X)& \mathrm{Cov}(XY)\\ \mathrm{Cov}(XY)& \mathrm{Var}(Y)\end{array}\right] ,$$
(2)

where X and Y are vectors that refer to a composition of the x and y coordinates of the boundary pixels, and Var and Cov refer to the variances and the covariance of the values, respectively. The L2W is calculated based on the covariance matrix by Eq. (3).

$$\mathrm{L}2\mathrm{W} = {eig}_{\mathrm{max}}(S)/{eig}_{\mathrm{min}}(S) ,$$
(3)

where eigmax(S) and eigmin(S) are the maximum and the minimum eigenvalues of the matrix S (Lin et al. 2017). A ruleset was defined based on objects L2W to eliminate these riverbeds, and objects that had values more than 100 were removed from the detected objects as landslides.

Results and accuracy assessment

This paper used the ResU-Net as a DL approach and a rule-based OBIA as a knowledge-based approach for landslide detection. We used the resulting landslide probabilities from the ResU-Net as an input feature and original data from the images to our OBIA approach for the integration framework. Figure 4a shows the resulting landslide heatmap from the ResU-Net. The results of each applied knowledge-based ruleset are represented in Fig. 4b. The resulting landslide detection maps from the ResU-Net and our integrated approach are overlaid in Fig. 4c for better visual comparisons.

Fig. 4
figure 4

Representation of a the resulting landslide probabilities from the ResU-Net, b merged segments after each applied OBIA ruleset, and c comparison of the ResU-Net result and that of our integrated approach

The resulting landslide detection maps are shown in Fig. 5. Considering our inventory map, which labeled only landslides and comprised landslide and non-landslide areas, we classified the images into two categories: landslides and non-landslides.

Fig. 5
figure 5

Representation of the landslide detection map resulting from a the ResU-Net, b OBIA, and c the ResU-Net-OBIA approaches

To prove the superiority of our integrated framework and its performance, we compared our landslide detection results with those of ResU-Net and OBIA alone. For quantitative evaluation of the integrated framework and other resulting maps, a validation dataset is required to compare the results of each method. For this aim, part of the study area that was not used in the training process was held apart for testing, validation, and accuracy assessment of the resulting landslide detection maps. Therefore, the applied landslide detection approaches were validated against the inventory data set of the testing area.

The landslide detection results were validated by measuring the number of pixels allocated as true positive (TP), false positive (FP), and false negative (FN). Thematic accuracy assessment metrics were used to quantitatively assess the landslide detection performance, including precision, recall, and f1-score. Two areas were selected and enlarged in Fig. 6 to illustrate better the TP, FP, and FN in the landslide detection maps resulting from the three different methods used, ResU-Net, OBIA, and ResU-Net-OBIA. The Precision metric calculates how specific models are in landslide detection, and Recall represents how many landslide pixels are correctly detected. The F1 measure is a combined measure between Precision and Recall. The metrics are derived from Eqs. (46), and the accuracy assessment results are represented in Table 2.

Fig. 6
figure 6

Representation of two enlarged areas from the spatial overlapping of the inventory map and landslide areas obtained from three different methods that illustrate the true positive (TP), false positive (FP), and false negative (FN) areas

Table 2 Resulting validation values for each landslide detection approach
$$\mathrm{Precision}=\frac{TP}{TP+FP}$$
(4)
$$\mathrm{Recall}=\frac{TP}{TP+FN}$$
(5)
$$F1-score =2 \times \frac{\mathrm{Precision} \times \mathrm{ Recall}}{\mathrm{Precision} + \mathrm{Recall}}$$
(6)

The thematic accuracy assessment metrics in Table 2 demonstrate that the ResU-Net results in higher accuracy than our simple OBIA approach. However, using the same OBIA approach on top of the ResU-Net landslide probabilities significantly increased the precision from 61.29 for the ResU-Net to 73.14, which means the integrated approach could correctly detect more landslides than using ResU-Net alone. Moreover, the higher precision values suggest that the integrated system could reduce FP by reducing the number of non-landslide pixels detected as a landslide by ResU-Net. The improvement in recall value is not as significant as that of precision: only a three percentage-point increase is obtained. In addition, the considerable difference between the resulting accuracy assessment values of the OBIA, and the ResU-Net-OBIA can be attributed to the substantial role of the ResU-Net landslide probabilities in the integrated approach.

Discussion

Riverbeds with a high probability of being a landslide were the main reason for the low precision value in the landslide detection map obtained by applying ResU-Net. Based on pixel-based context, spatial properties were not considered, and the individual pixels of riverbeds had similar spectral and slope behavior as a landslide. Still, in OBIA, geometrical object properties like length and width were used to filter riverbeds quickly, leading to a high precision value improvement. This case shows that even complex algorithms, including DL models applied in the pixel-based domain, are limited in the complex mapping of objects like a landslide. Using OBIA makes it possible to take advantage of knowledge-based rulesets like L2W to mitigate some DL limitations in landslide detection.

Moreover, OBIA itself is highly dependent on expert knowledge and experience to obtain satisfactory results. However, in cases where a certain level of knowledge is absent, the OBIA will fail on segmentation (Majd et al. 2019), as shown by our simple OBIA approach. Therefore, our experiments in this study can be an example of the importance of integrating object-based classification with the DL approaches, such as segmentation tasks like landslide detection. Since no study has integrated a DL model or an FCN approach with the OBIA framework for landslide detection, we cannot compare our results with the literature. Moreover, almost all of the landslide detection studies that have been done with DL models used VHR imagery (e.g., WorldView and GeoEye), and spatial resolution of the image plays a leading role in landslide detection accuracy. In comparison, our experiments have been done based on freely available medium resolution satellite imagery. These images are ideal for getting and interpreting information from large areas, like landslide detection for regions affected by earthquakes or heavy rainfalls.

Our proposed integration approach will work for other geographical, geological, and climatological settings. However, the segmentation and rule-based classification parameters may differ. It means that the defined parameter must be localized to other geographical locations considering the factors such as landslide size, vegetation density, slope, and season.

Conclusions

The main focus of this study was to evaluate the possibility of integrating two domains of pixel-based DL model and OBIA for a specific example of landslide detection. We have shown one possible solution of using OBIA as a refinement process for the DL model. Therefore, a new integration approach was proposed to detect landslides from Sentinel-2 imagery. Specifically, a rule-based OBIA was designed to add knowledge to refine the landslide detection results based on the ResU-Net. We found the geometric feature L2W helpful in discriminating between landslides and riverbeds and consequently increased the landslide detection accuracy. The proposed approach successfully improved the f1-score values of the resulting landslide detection maps from the ResU-Net and the OBIA approaches by more than 8 and 22 percentage points, respectively.

Although many DL models have resulted in state-of-the-art results in different object detection and instance segmentation tasks, there remains a great deal of room for improvement, especially using object-based classification approaches instead of pixel-based ones. The OBIA approaches use homogeneous sets of image pixels for landslide detection, similar to the way an expert can detect landslide areas as individual entities. Therefore, given the difficulties in distinguishing landslides using pixel-based DL like the Res-UNet model, the combination of the resulting heatmap of such models with OBIA offers a promising method for adding prior knowledge to the process of mapping landslides over regional scales. It leads to a reduction in the adverse impacts of some associated DL limitations (e.g., fuzzy borders in the classification results). Our future work will evaluate such integrations on other use cases, such as building extraction.