1 Introduction

Landslides are among the world's most common geological hazards, which can be caused by heavy rainfall, earthquakes, snowstorms, and human activities such as deforestation, construction, and excavation (Keefer 1984; Iverson 2000; Bozzano et al. 2004; Guerriero et al. 2021; Picarelli et al. 2022). In urban areas, landslides can be devastating due to the high concentration of structures and people, leading to significant property damage, injury, and loss of life and disrupting essential services such as transportation and communication (Miele et al. 2021; Liang et al. 2022). Landslides in urban areas can be reduced by assessing slope stability properly, regulating development activities, and implementing comprehensive early warning systems (Bozzano et al. 2011; Gao et al. 2022; Tsironi et al. 2022). Also, one of the essential factors that can be used to reduce landslide risks is proposed as this paper's purpose is the accurate and reliable prediction of the cumulated deformation in urban areas (Chen et al. 2017; Confuorto et al. 2022).

The southern Italian Apennine is among the sites exhibiting the highest density of landslides globally (Guerriero et al. 2019; Di Carlo et al. 2021). For example, the town of Moio della Civitella (Salerno Province) has continuously damaged its urban settlement (Di Martire et al. 2015; Infante et al. 2019).

Predicting cumulative deformation landslides is challenging because it involves transferring historical slide information in time. Moreover, the slow movement under settlement cover further complicates the task. The presence of human intervention can also prevent the formation and identification of the typical morphologies of an evolving landslide. It is important to track landslides affecting settlements to identify landslide areas and understand their evolution accurately. Assessment of future activities can be aided by this information. Mapping, predicting, and classifying geomorphological and geospatial data may contribute to landslide formation and occurrence (Del Soldato et al. 2017; Rosi et al. 2018).

Analysis and prediction of geological hazards using data from various sources serve as an important basis for mitigating these intensive effects caused by landslides, such as loss of life, property damage, the destruction of property and infrastructure, environmental degradation, and deformation of communities (Dai et al. 2002; Gutiérrez et al. 2008; Kjekstad and Highland 2009; Lacasse et al. 2009). The deformation of landslides over time is an effective dataset for understanding the characteristics of landslides and predicting their future development (Jiang et al. 2021).

Using satellite remote sensing, urban landslides can be identified and monitored at high spatial resolutions (Scaioni et al. 2014; Nolesini et al. 2016; Amitrano et al. 2021; Khalili et al. 2023d). By allowing limited deformation rates to be recorded and operated over broad areas, costs and computation times can be minimized (Tofani et al. 2013; Di Traglia et al. 2021; Macchiarulo et al. 2022). Synthetic aperture radar (SAR) imagery-based methods (Solari et al. 2020) have been widely applied in this context, providing multi-temporal deformation rate distribution maps that support landslide identification under settlement cover and both retrospective and operational monitoring (Foumelis et al. 2016). Permanent scatterer interferometry (PSI) (Ferretti et al. 2001) is a SAR image processing technique that measures ground movement over time with high accuracy. It uses stable points (PSs) in SAR images as reference points and provides accurate and long-term monitoring of subsidence, volcanic activity, tectonic processes, slow-moving landslides, and other causes of ground deformation (Zhou et al. 2009; Lu et al. 2012; D’Aranno et al. 2021; Khalili et al. 2023d, b). X-band imagery acquired in the COSMO-SkyMed (CSK) mission is an example of satellite products that are particularly suitable for retrieving deformation data of landslides under urban cover because of their high spatial resolution and short revisiting time of acquisition (Costantini et al. 2017; Di Martire et al. 2017; Khalili et al. 2023a). Therefore, it can measure the surface deformation at high temporal resolution and with high accuracy and be used to diagnose the progression of landslide movement (Confuorto et al. 2022). Hence, after processing the CSK images using the PSI technique, a valuable dataset is generated for training the DLAs as labels to predict the cumulated surface deformations over time.

Various types of MLA have been implemented for accurate and timely landslide prediction (Zhou et al. 2017; Gan et al. 2019). It also consists of Bayesian networks (Chen et al. 2017), logistic regression (Wang et al. 2017), decision trees, random forests (Hong et al. 2016), and support vector machines (Liu et al. 2021), which have been widely used to capture the occurrence of landslides. On the one hand, DLAs are highly recommended to outperform conventional statistical models for most applications of time-series prediction (Confuorto et al. 2022). Multivariate regression models (Krkač et al. 2020), auto-regressive integrated moving average models (Zhang 2003), as well as other traditional approaches can be used to forecast individual time series in a wide range of applications (Zhang 2003; Xu and Niu 2018), however, are not able to describe the behavior of multivariate time series. On the other hand, by integrating multiple processing layers, datasets with multiple dimensions can be analyzed using DLAs to extract learning features and nonlinear dependencies (Li et al. 2020; Ma and Mei 2021). It has been shown that DLAs can be used as a model for predicting deformation due to recent advances in this field (Jiang and Chen 2016; Hajimoradlou et al. 2020; Hua et al. 2021). Two famous types of DLAs include RNNs and convolutional neural networks (CNNs). RNNs, including LSTM and GRU, are neural network architectures commonly used for sequential data processing tasks such as time-series prediction. They pass information from one time step to the next, allowing the network to maintain a memory of past inputs. In contrast, CNNs are designed to work by applying filters to local patches of the input data. While RNNs are suited for tasks where past inputs influence future predictions, CNNs are ideal for tasks where spatial relationships between input data are important. These DLAs illustrate the promise and have the potential to enhance significantly landslide prediction, providing valuable information for disaster risk management (Azarafza et al. 2021; Dao et al. 2020; Habumugisha et al. 2022; Huang et al. 2020; Orland et al. 2020; Saha et al. 2022; Shirzadi et al. 2018).

Another type of DLAs is GNNs, a class of neural network algorithms designed to operate on graph-structured data. GNNs typically consist of multiple layers of computations; each aggregates information from neighboring nodes in the graph and updates the node representations. Numerous applications in this field have been demonstrated to be efficient and effective using GNNs, including traffic forecasting, human gesture detection, and urban flow prediction (Yu et al. 2018; Huang et al. 2020; Wang et al. 2020). GCNs are a type of GNNs that use graph convolutions to propagate information across nodes and edges (Khalili et al. 2023c). In contrast with traditional CNNs that operate on regular grid-like data, GCNs generalize the convolution operation to graph-structured data. GCNs can learn representations of nodes in a graph considering the graph structure. This enables them to capture complex relationships between nodes and make predictions about them in a graph. In predicting cumulative deformation caused by landslides, GNNs model the interactions between predisposing factors such as geological, geomorphological, and geospatial factors. By taking into account the spatial relationships between these factors, GNNs can make more accurate predictions of deformation compared to traditional MLAs and DLAs that only consider single-node feature information (Kuang et al. 2022; Zhou et al. 2021; Zeng et al. 2022).

In this paper, an advanced and synergic version of GNNs named GCN-LSTM is applied to a time series of cumulative landslide deformations and predisposing factors to predict the future cumulative deformation in the study area, considering spatio-temporal dependencies between features. The proposed modeling strategy was divided into two parts, where a single GCN was implemented to detect the spatial interdependency between paired data points in the initial timeframe. Then, consider GCN-LSTM to detect spatio-temporal dependency between paired data points and predict cumulative deformation caused by landslides using the time series of PS points as labels and predisposing factors as features. After comparing with different DLAs, the presented algorithm (GCN-LSTM) outperforms all the traditional DLAs.

The following sections will describe the study area in greater detail, and the datasets utilized for developing the suggested algorithm will be analyzed and discussed. Afterward, the outcomes and conclusions will be presented.

2 Case study

The Moio della Civitella landslides are located in Cilento, Vallo di Diano, and Alburni National Parks of the Salerno Province of southern Italy. They involve the slope along with the Moio della Civitella village is built from 600 to 200 m a.s.l., which is characterized by typical hilly morphology and low gradient. The landslides involve the Crete Nere of the Saraceno Formation, cropping out in this sector of the Apennine Mountains. This formation is mainly represented by argillites with carbonate intercalations and weathered siliciclastic arenites. The structural characteristics of the landslide area are similar to those of the southern sector of the Italian Apennines, with diffuse and pervasive discontinuities and extremely variable bedding characterizing the described rocks. The Saraceno Formation is locally overlaid by Quaternary rocks comprising heterogeneous debris encased in the silty-clayey matrix (Di Martire et al. 2015).

The heterogeneity in lithology and the consequent complex hydrogeological behavior of rocks forming the slope contribute to the instability at Moio della Civitella. Following (Cruden and Varnes 1996), such landslides are flows, rotational and translational slides (Fig. 1). The main slope instabilities were believed to be the result of ancient landslides affecting most of the slope. According to the map in Fig. 1, most of the identified landslides directly impacted settlements, including lifelines and significant communication routes (Infante et al. 2019; Miano et al. 2021; Mele et al. 2022).

Fig. 1
figure 1

Landslide inventory map of the Moio della Civitella area. Landslide boundaries are derived from the database of the Hydro-geomorphological Setting Plan, Southern Apennines Hydrographic District (2015). Legend: AFD, diffuse landslide deformation; CLR, debris flows; CLT, earth flows; CRP, creep; ESP, lateral spreading; SCR, rotational/translational slides; and graticule polygon: urban area

As a result of the presence of such landslides, the area of Moio della Civitella has been extensively investigated using topographic measurements, inclinometers, and GPS networks (Di Martire et al. 2015). Such investigation indicated that landslides actively moved between nearly − 2 cm and + 1.5 cm.

3 Material

3.1 SAR data

SAR data have been widely used in the research of landslides due to their wide range of applications, high spatial and, in some cases, temporal resolution, and their ability to work under any weather conditions (Herrera et al. 2011; Scaioni et al. 2014; Sellers et al. 2023). The objective of this study was to process the X-band imagery from the CSK missions for monitoring landslide-related surface deformation, which was processed by the PSI technique to find labels for the algorithms suggested in this study. Due to their high spatial resolution and short revisiting periods, these satellite products are particularly suitable for determining the location of landslides in urban areas.

As part of analyzing COSMO-SkyMed image stacks, 66 descending images were analyzed during 2012–2016 (Infante et al. 2019), an excellent starting point for being used as the first step of the proposed algorithms (GCN) in this study. Furthermore, for the period 2015–2019, 65 descending images have been collected (Mele et al. 2022), which can be applied to implementing several DLAs, including RNN, GRU, and LSTM, and also predict cumulative deformation using the proposed model (GCN-LSTM) in the second step. As a result of the analysis, maps of mean deformation rates (Fig. 2) and time series of deformations have been generated. As shown in Table 1, an overview of the images acquired throughout the experiment can be found.

Fig. 2
figure 2

Mean deformation rate maps for the period: a 2012–2016 and b 2015–2019

Table 1 Details of the COSMO-SkyMed acquisitions (Infante et al. 2019)

3.2 Predisposing factors

Geological, geomorphological, and geospatial data were used as predisposing factors. These include elevation, slope, aspect, Topographic Wetness Index (TWI), Stream Power Index (SPI), geology, flow direction, total curvature, plan curvature, profile curvature, and also geospatial data such as Normalized Difference Vegetation Index (NDVI) and land use to contribute to the formation of landslides in this case study (Chen et al. 2018, 2019; Achour et al. 2018). To learn and train the GCN models, relationships and connections have been created between the predisposing factors mentioned above and briefly discussed below practically.

The primary geological, geospatial, and geomorphological data used in this study include: i) a 1:50,000 geological map of Moio della Civitella, which is used to study the geological background; ii) a 10-m pixel resolution Digital Elevation Map (DEM) of the case study mainly utilized to investigate the topographic and geomorphological features of Moio della Civitella and obtain elevation, slope, flow direction, aspect, total, plan, and profile curvature, and also Topographic Wetness Index (TWI), and Stream Power Index (SPI); iii) Landsat7 ETM + remote sensing images with a resolution of 30 m for bands 1–7 (Time: from 2012 to 2015, PATH: 188, ROW: 032), mainly used to discuss the climatic and environmental characteristics of Moio della Civitella and acquire the Normalized Differential Vegetation Index (NDVI), and land-use type.

The TWI is used in hydrological analysis to measure water accumulation in an area. It indicates steady-state moisture and quantifies the effect of topography on hydrological processes. The TWI is based on the slope and upstream contributing area width and was designed for hillslope catenas. It has been found to be correlated with several soil attributes, including horizon depth, silt percentage, organic matter content, and phosphorus content. The TWI is calculated differently based on how the upstream contributing area is determined. It is not applicable in flat areas with vast water accumulations (Novellino et al. 2021).

A slope's strength is significantly affected by the scouring and infiltration of flowing water. SPI is a parameter that measures stream power and erosion power of flowing water. SPI estimates streams' capacity to potentially modify an area's geomorphology through gully erosion and transportation. To determine the erosive power of flowing water, SPI considers the relationship between discharge and a specific catchment area (Di Napoli et al. 2021).

Curvature analysis provides information regarding these causative sources' location, depth, dip direction, and magnetic susceptibility. A quadratic surface is applied within a standard moving window of 3 × 3 in curvature analysis. The total curvature is converted into the profile and plan curvatures, profile curvature along the maximum slope direction, and plan curvature perpendicular to the maximum slope direction. Each section of the curvature space provides crucial information to determine whether the source is 2D or 3D. This is an essential consideration in remote field mapping since it allows the interpreter to distinguish between a contact, a dyke, and extensive lithology without observing directly (Di Napoli et al. 2021).

The direction a slope faces, known as the aspect, can influence the distribution and flow of water, soil, and vegetation growth. Because of these impacts, the aspect is a significant consideration in geomorphological and ecological research. Typically, the aspect is measured with a compass, with angles ranging from 0 to 360 degrees, where the north is represented by 0/360, east by 90, south by 180, and west by 270. Including aspect data in geological analysis can enhance our understanding of geological processes and their interactions (Pyrcz and Deutsch 2014).

The Normalized Difference Vegetation Index (NDVI) measures vegetation's near-infrared reflectance and absorption of red light. There is a range of − 1–+ 1 in the NDVI. There is, however, no clear distinction between each type of land cover. In the case of negative NDVI values, it may be dealing with water; however, when the NDVI value is close to + 1, it is more likely that it is dense green foliage. It may even be an urban area when the NDVI is close to zero, as no green leaves are present. A high NDVI value indicates that the vegetation is healthier than when the NDVI value is less; it is a standardized method of assessing vegetation health. A low NDVI indicates a lack of vegetation (Ammirati et al. 2022).

A geological map illustrates the distribution of lithologies on the surface of the Earth. Geological maps show the distribution of different types of rock and deposits and the location of geological structures such as faults and folds. In general, rock types and unconsolidated materials are depicted in different colors according to their types. Data that have been manually collected are displayed on a geological map (Di Napoli et al. 2022). The hydrological characteristics of a surface can be determined by determining the flow direction from every raster pixel. In this function, a surface is taken as an input, and a raster is created to show the flow direction between each pixel and its steepest downslope neighbor (Di Napoli et al. 2020b).

With a spatial resolution ranging from 30 by 30 m to three arc seconds (approximately 90 by 90 m), the Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) covers about 80% of the globe between 60° north and 56° south. There are a variety of formats in which elevation data are available, and they are continually being generated (Di Napoli et al. 2020a).

This study uses a blend of geological, geospatial, and geomorphological features alongside the proposed algorithm (GCN-LSTM), implicating a critical data categorization phase (Soares et al. 2022; Nasir et al. 2022). This categorization is executed during the pre-processing of features in QGIS software. We acknowledge that discretizing continuous variables may lead to a loss of information due to the introduced artificial strata, but this is an accepted trade-off given our methodology. The decision to categorize our continuous variables stems from the nature of the proposed model and the specific characteristics of our geospatial data. Models such as GCN and LSTM capture nonlinear and complex dependencies more effectively when dealing with categorized features (Cui et al. 2020; Chen et al. 2022). The categorized features can better represent our geospatial data's complex, multi-dimensional relationships, contributing to better model interpretability (Kshetrimayum et al. 2023).

Geospatial data are naturally complex and multi-dimensional and describe objects or events with a location on or near the Earth's surface. Representing such data efficiently requires effective categorization techniques, dealing with a mix of numerical and categorical data. Categorization permits sophisticated data analysis methods such as geospatial analytics, MLAs, and DLAs. These methods can uncover patterns and relationships that might be too complex to understand through raw data alone.

After categorization, data normalization is the subsequent step in our code implementation. Normalization scales the data with a mean of 0 and a standard deviation of 1, ensuring each feature contributes equally to the analysis and preventing bias toward certain features. For deep learning models such as GCNs and LSTMs, normalizing inputs can improve model convergence during training and reduce the risk of vanishing or exploding gradients. We used min–max normalization to scale all features to the range [0, 1] (Ioffe and Szegedy 2015; Borkin et al. 2019).

We concur with the potential concerns about normalizing inherently categorical variables but believe that it was essential due to the architecture of the GCN-LSTM model. Features with larger magnitudes may disproportionately influence the learning process; hence, normalization ensures that all features are on a similar scale and contribute equally to the model's learning. Our normalization process also respects the inherent properties of our categorical variables. For instance, we took special consideration with the “Aspect” feature, a circular variable normalized using a method that preserved its circular nature. Traditional normalization techniques that scale linear variables to a consistent range are unsuitable for circular variables as they distort the relationships in the data. Therefore, we converted this circular variable into two variables using sine and cosine transformations, thus preserving the circular proximity of the data points. This transformation ensures that the circular nature of the “Aspect” feature is preserved during normalization, allowing accurate representation for subsequent analysis by the proposed algorithm (Goodfellow et al. 2016).

Hence, the combined use of feature categorization and normalization techniques ensures that they are on a similar scale and that the GCN-LSTM model can learn from them without any bias toward larger magnitude features. This approach is critical for geospatial and remote sensing data with significant noise and variability. By using these methods, we can create more relevant or informative features, reduce the input data size, and potentially improve the computational efficiency of our proposed deep learning algorithm.

4 Methodology

A spatio-temporal prediction strategy is required to forecast cumulative deformation caused by landslides because the evolution of landslide movement often reveals the spatial and temporal characteristics of the movement. The first step of our strategy (Fig. 3) involves pre-processing to obtain spatial attributes obtained with the PSI processing technique (velocity) and the time-series datasets as inputs to the second step. The GCN module captured the spatial-dependent variables, while the LSTM module captured the temporal-dependent variables. It is necessary to prepare the best interval of correlation distance in the first step of the analysis and subsequently predict cumulative deformation by the proposed model (GCN-LSTM) and discuss and compare the obtained results with other outputs of DLAs (simple RNNs, GRU, and LSTM which work based on temporal dependency) and another algorithm which work based on spatio-temporal dependency named GCN-GRU in the second phase of implementation.

Fig. 3
figure 3

Flowchart

4.1 SAR data processing

A satellite-based differential interferometry technique using aperture radar (DInSAR) (Gabriel et al. 1989) has proven to be helpful in detecting ground movements caused by subsidence, landslides, earthquakes, and volcanic activity, as well as monitoring structures and infrastructure (Di Martire et al. 2014). However, this DInSAR method is susceptible to issues such as temporal and spatial decorrelation, signal delays due to atmospheric conditions, and errors in orbit or topography (Hooper et al. 2004a). Over time, improvements in these techniques have helped to mitigate some of these limitations, one of which is PSI. With the application of advanced DInSAR techniques, the accuracy of the rate maps and time series of deformations has been enhanced to around 1–2 mm/year and 5–10 mm, respectively (Manzo et al. 2012; Calò et al. 2014; Pulvirenti et al. 2014; Xing et al. 2019).

The PSI technique uses radar targets known as permanent scatterers to obtain high interferometric coherence (Hooper et al. 2004b; Hooper 2008). This is achieved by eliminating geometric and temporal interferometric effects due to the high stability of the targets over time. The Digital Elevation Model (DEM) used in this technique has a cell resolution of 3m × 3m and a multi-looking factor of 3 × 3 in range and azimuth. The coherent pixels technique (Blanco-Sànchez et al. 2008), implemented in the SUBSIDENCE software developed at Universitat Politecnica de Catalunya, was used to apply the PSI method. The software processed the co-registered images and selected all possible interferogram pairs with spatial baselines lower than 300 m, using a temporal phase coherence threshold of 0.7. Finally, the deformation rate map along the line of sight (LoS) and time series of cumulative deformation were calculated.

4.2 Recurrent neural networks (RNNs)

Predicting the deformation caused by landslides can be done using RNNs. In this algorithm, the RNN is trained on historical data of landslide deformation to make predictions about future deformation. The RNNs ability to store information from previous time steps allows it to identify patterns and dependencies in the data. This information is then used to make informed predictions. The input to the network consists of geological, geospatial, and geomorphological data, while the label used for comparison with the proposed algorithm's output is the cumulative deformation caused by landslides. Finally, the model's output predicts the amount and distribution of future cumulative deformation caused by the landslide. This model is useful for disaster preparedness, risk assessment, and a deeper understanding of the processes behind landslides. RNNs are an ideal choice for temporal prediction because they are specifically designed to handle sequential data and can effectively handle time-series data of deformation over time. These references can be used to learn more about RNNs and their equations in detail (Liao et al. 2019; Sherstinsky 2020).

4.2.1 Long short-term memory (LSTM)

The LSTM algorithm is an advanced artificial neural network that deals with time-sensitive information. It is beneficial for predicting cumulative deformation in landslides as it has the ability to overcome the limitations of standard RNNs. One of these limitations is the "vanishing gradients” (The vanishing gradients problem arises because the gradients (derivatives) of the loss function concerning the weights in the network become very small as they are backpropagated through time. This makes it difficult for the network to learn and retain information from earlier time steps in the sequence) problem, which refers to the difficulty of training RNNs to model long-term dependencies in sequential data. The critical innovation in LSTMs is the use of particular neurons called LSTM cells, which have gates that control the flow of information in and out of the cell, allowing the network to store and use vital information over a more extended period.

The algorithm operates by utilizing memory cells that can retain information for extended periods and control gates to manage the flow of information in and out of these cells. Training the LSTM on previous landslide deformation data can identify patterns and accurately predict future deformation. LSTMs have been demonstrated to effectively handle complex, nonlinear relationships in time-series data, making them a reliable tool for landslide prediction (Hochreiter et al. 2001; Yuan et al. 2020).

4.2.2 Gated recurrent unit (GRU)

GRUs are RNNs designed explicitly for time-series predictions. This can be applied in various scenarios, such as predicting cumulative deformation from landslides. GRUs are considered a simpler alternative to LSTM networks, another type of RNN utilized for time-series predictions. The hidden state (the hidden state is a vector of values that captures the memory of the network across time steps. The hidden state is updated at each time step, based on the input data and the previous hidden state) of GRUs is updated based on the current input and its previous hidden state, allowing it to capture information from previous time steps in a sequence, which is then used to predict future values. The unique feature of GRUs is its use of "gates" that regulate the flow of information in and out of the hidden state, preventing the network from losing crucial information as it processes new data and making it easier to model long-term dependencies in a sequence (Cho et al. 2014; Chung et al. 2014). GRUs would be trained on historical geological, geospatial, and geomorphological data, and measurements of PS points taken over time at a landslide site to predict cumulative deformation due to landslides. The network will then use this information to identify patterns in the data and make future deformation predictions, which can be used for making decisions to mitigate the risk posed by the landslide.

LSTMs and GRUs differ in the following ways:

  • Compared to LSTMs, GRUs have fewer parameters and are computationally less expensive.

  • Controlling the flow of information between GRUs and LSTMs is accomplished by a gating mechanism. In contrast, GRUs have two gates (update and reset gates), while LSTMs have three gates (input, output, and forget gates).

  • GRUs do not have a separate memory cell, whereas LSTMs have one to store information for long periods. Information is instead stored in the hidden state by GRUs.

  • With long sequences, RNNs faced the problem of vanishing gradients. LSTMs were introduced to overcome this problem. LSTMs handle this issue better than GRUs, but GRUs do address this issue somewhat.

4.3 Graph neural networks (GNNs)

GNNs have proven to be effective in modeling graph data in recent years. These DLAs process data structured in graphs using an update function considering nodes' interdependence. This processing results in a feature space that provides information about the relationships between nodes in the graph. For example, GNNs can be used to analyze the connections between factors such as geology, geomorphology, and geospatial data in predicting landslide deformation. These factors are represented as nodes in a graph, with their relationships shown as edges. The GNN then processes this graph by updating the representation of each node and using the final representation to make predictions about cumulative deformation.

For comparison, RNNs (such as LSTMs and GRUs) are specifically designed to process sequential data and identify dependencies between time steps (temporal dependency). Meanwhile, GNNs are optimized for processing graph-structured data and understanding node relationships (spatial dependency) (Wu et al. 2021; Behrouz and Hashemi 2022).

4.3.1 Graph convolutional networks (GCNs)

GCNs work based on the filter parameters that are passed through all nodes of a graph (Cortes et al. 2015), and they extract new input features on a graph \(G = (\,V,\,\,E)\) that has a feature space \(x_{i}\) for every node i. The feature space is a N × D matrix X where N is the number of nodes, and D is the number of features; moreover, the interdependency between each pair of nodes is represented in the form of an adjacency matrix A which is an N × N zero–one matrix represented as follows:

$$A = \left[ \begin{gathered} 0\,\,\,1\,\,\,...\,\,\,0 \hfill \\ 1\,\,\,1\,\,\,...\,\,\,\,1 \hfill \\ \,\,\, \vdots \,\,\,\,\, \ddots \,\,\,{\kern 1pt} \vdots \hfill \\ 1\,\,\,1\,\,\,...\,\,\,0 \hfill \\ \end{gathered} \right]$$

The value \(A_{ij}\) is either 1 or 0, depending on whether there is an edge or interdependency between nodes i and j. If there is a path from i to j, the value \(A_{ij}\) is 1; otherwise, it is 0. In this study, correlation distance was used to create an adjacency matrix for the first modeling approach, GCN, since a zero–one adjacency matrix was considered to determine the interdependency between each pair of points. More specifically, first of all, correlation distance was computed for each pair of points within the area, then 1 was selected for the pairs with correlation distance in the interval \([0,\,\,a)\) where \(a < 2\), and 0 for the rest of the elements in the adjacency matrix. Therefore, the hyperparameter “a” is obtained in hyperparameter tuning for the GCN model applied in the first step of the proposed modeling approach. Hyperparameter tuning involves adjusting the various parameters of a GCN algorithm to optimize its performance for a specific dataset. For example, in GCN, the number of layers, the number of nodes in each layer, and the type of activation function used can all affect the algorithm's performance. By adjusting these parameters, a GCN algorithm can be made to work better for a particular graph.

In this study, Pearson correlation was used to determine the amount of non-Euclidean interdependency in GCN. Pearson correlation is used to measure linear dependence between two variables. By analyzing the Pearson correlation between nodes, GCN can determine the amount of non-Euclidean interdependency in the graph.

$$d(X,\,\,\,Y) = 1 - \frac{{\sum\limits_{i = 1}^{n} {\left( {x_{i} - \overline{x}} \right)} \left( {y_{i} - \overline{y}} \right)}}{{\sqrt {\left( {x_{i} - \overline{x}} \right)^{2} \left( {y_{i} - \overline{y}} \right)^{2} } }},$$
(1)

The output of this learning process provides a node-level output matrix Z, where F is the number of output features for each node. Some pooling operation is required for modeling graph-level outputs (Cortes et al. 2015). The hidden state of the network layer at the time \(\left( {l + 1} \right)\) can be represented as follows:

$$H^{{\left( {l + 1} \right)}} = f(H^{\left( l \right)} ,\,\,A),$$
(2)

where \(H^{\left( 0 \right)} = X\) and \(H^{\left( L \right)} = Z\), with L as the number of layers. Then, models can be different only in how the activation function (activation function is a mathematical function that is applied to the output of a neuron to introduce nonlinearity and enable the network to model complex relationships in the input data) \(f(.,\,.)\) is selected. For example, the forward propagation rule can be shown as follows:

$$f(H^{\left( l \right)} ,\,\,A) = \varphi \left( {A\,\,H^{\left( l \right)} \,\,W^{\left( l \right)} } \right),$$
(3)

where \(W^{\left( l \right)}\) is the trainable weight matrix for the l-th layer, and \(\varphi \left( . \right)\) is a nonlinear activation function like the tanh. The presence of A in the above function means that all the feature vectors of all neighbors of a target node are summed up except the node itself, but adding an entity matrix solves this issue. Furthermore, a positive definite matrix \(D^{{ - \frac{1}{2}\,}} \,A\,\,D^{{ - \frac{1}{2}}}\) is used to normalize the adjacency matrix A to help the algorithm works smoothly. Therefore, the forward propagation equation can be designed as follows (Gordon et al. 2021):

$$f(H^{\left( l \right)} ,\,\,A) = \sigma \left( {\hat{D}^{{ - \frac{1}{2}}} \,\hat{A}\,\hat{D}^{{ - \frac{1}{2}}} \,H^{\left( l \right)} \,W\,^{\left( l \right)} } \right),$$
(4)

with \(\hat{A} = A + I\), where I is the identity matrix. The embedding nodes can be fed into any loss function to apply forward propagation, and a stochastic gradient descent can be implemented to train the weight parameters using a backpropagation strategy (The strategy is used to adjust the weights of the connections between neurons based on the error between the network's predicted output and the actual output).

In this study, GCN is used in the first stage of our proposed modeling approach to create a regression model to predict the velocity based on twelve geological features, including elevation, slope, general curvature, NDWI, TWI, SPI, geologic map, land use, flow direction, plan curvature, and profile curvature; in addition, the velocity is obtained based on spatial and temporal landslide processing from 2012 to 2016. The main goal of this step is to identify the best adjacency matrix in the process of hyperparameters tuning, which is built upon the concept of Pearson correlation distance.

4.4 The proposed algorithm (GCN-LSTM)

Combining GCN with LSTM can enhance the ability to model complex relationships between nodes in graph-structured data. By doing so, both spatial and temporal information can be captured more effectively. In other words, combining GCN and LSTM provides a better approach to handling graph data's intricacies, including the relationships between nodes and changes over time.

Figure 4 shows the network structure based on the GCN-LSTM model proposed in this paper. According to the model, encoders and decoders make up the main structure. Graph network encoders use multiple parallel GCN modules to extract the key features of graph networks with different time series. To resolve the long-term and short-term dependencies between the time-series and sequence data, the time-series features are passed to LSTM, where LSTM analyzes them and extracts other features from the sequence data through LSTM. After that, the encoder generates an encoded pair vector, which is then sent to the decoder, where it is decoded as the last step. A multilayer feedforward neural network is used as part of the decoding process to analyze the coding vector's features further. Afterward, the processed data are sent to the GCN network to produce the predicted values to process the data further. Depending on the nature of the data to be used, how hyperparameter tuning is employed to obtain them, and how the data will be used, the number of layers and nodes in both the GCN and LSTM parts of the model will also change.

Fig. 4
figure 4

The overall structure of the GCN-LSTM model

This paper implemented the GCN-LSTM as the most influential model to detect the spatio-temporal behavior of a time series of 65 cumulative landslide deformations for 4085 data points from 2012 to 2016 in the first step and 2015 to 2019 in the second step.

5 Results and discussion

5.1 Model hyperparameters

In the initial step of the proposed algorithm, a GCN was used to determine the amount of non-Euclidean interdependency (spatial dependency) between any pair of points within the area; specifically, the velocity of landslides, obtained through spatial and temporal processing from 2012 to 2016, was used as a label in a regression model. Using twelve predisposing factors, GCN was modeled to determine the best correlation distance between data points, which was deemed the most critical hyperparameter for the prediction task in the second step.

The correlation distance illustrates how each pair of data points are similar to each other based on the values of features they obtain, not the Euclidean distance between them. The optimal range of correlation distance for creating an adjacency matrix was the interval [0, 0.1) since it provided the best evaluation metrics among other scenarios after tuning hyperparameters. The more dependent the two data points were, the closer the correlation distance was to zero, so in the adjacency matrix, one was selected for the pairs with a correlation distance less than 0.1 and 0 for the rest of the pairs.

This interval can thus be used to predict future cumulative landslide deformations, which was this study's second objective. Table 2 provides the optimal hyperparameters for the first modeling approach.

Table 2 Optimal hyperparameters for the first modeling approach

5.2 Overall performance comparison

In the second step, a mixture model consisting of GCN and LSTM and another rival, including GCN and GRU, were implemented on a time series of 65 cumulative landslide deformations for 4085 PS data points from 2015 to 2019, and they were used to predict the future deformation in the study area; furthermore, since the velocity in the first step was obtained based on spatial landslide processing from 2012 to 2016, the best interval for correlation distance in the first step was used to compute adjacency matrix for the second step. In other words, this interval was perceived as the amount of information transferring from the previous timeframe to the new one, predicting cumulative deformation caused by landslides. After hyperparameter tuning for different scenarios, two channels were selected for the GCN part, each with 20 and 12 nodes, respectively; moreover, two channels were designed for the LSTM part of the proposed model, with 400 nodes for each one. The results of hyperparameter tuning for the GCN-LSTM approach are presented in Table 3.

Table 3 The best hyperparameters for the GCN-LSTM model

Table 4 reports the prediction performance of all models in terms of four evaluation metrics (mean squared error (MSE), mean absolute error (MAE), root-mean-squared error (RMSE), and R-squared (R^2) (Chicco et al. 2021)). As can be seen in this table, the proposed model GCN-LSTM consistently outperforms other methods in all the evaluation metrics. This result demonstrates that long- and short-term memories are essential in predicting cumulative deformation caused by landslides in this dataset. According to the literature, the significant difference between GRU and LSTM is that GRU's bag has two gates that are reset and updated, while LSTM's bag has three gates: input, output, and forget. This means that GRU has fewer gates than LSTM, so it is less complex than LSTM because it has fewer gates; also, when the dataset is small, the GRU will be preferred; otherwise, LSTM will be used. Therefore, it confirms our conclusion since 4085 data points are used in this study. In addition, traditional DLAs such as simple RNN, GRU, and LSTM perform poorly due to their inability to detect spatial dependency between data points. In fact, by incorporating GCN into LSTM, GCN-LSTM can effectively incorporate the graph structure information into the sequential data. GCN-LSTM is generalized better to unseen data than RNNs, LSTM, and GRU, as it can model the graph structure information, which is more expressive and can help the model capture the underlying patterns in the data more effectively. Another reason is the better handling of non-Euclidian relationships, which means that GCN can effectively handle non-Euclidian dependencies between nodes in a graph, which results in a better performance than RNNs, LSTM, and GRU, which are limited to Euclidian and temporal relationships.

Table 4 Evaluation metrics for the second modeling approach

Therefore, the GCN-LSTM model is excellent at modeling long- and short-term dependencies in time series and significantly improves the prediction performance over the traditional methods. As shown in Table 4, in the second step, the prediction task, GCN-LSTM, outperformed all the three DLAs, such as RNN models, and worked better than its spatio-temporal counterpart GCN-GRU for all evaluation metrics in the test set.

5.3 Visualizations

To explore and better understand how the proposed prediction model (GCN-LSTM) worked, the image on the right corresponds to the cumulative deformations on March 19, 2019 (Fig. 5). It is placed for comparison with another image corresponding to the last predicted epoch for cumulative deformations on the same date. This figure illustrates how the proposed model has been able to correctly predict both positive and negative deformation amounts and locations with a mean absolute error of less than 0.02. Also, this figure illustrates how the proposed model has performed well in areas with deformations over 2 cm (red box and blue box), areas with a mixture of positive and negative deformations (black box), and areas with deformations lower than 2 cm (white box). As can be seen in this figure, our model provides an excellent fit to the observed data, demonstrating its ability to predict cumulative deformation accurately and reliably and, thus, the risk of landslides in the studied area. The close match between the predicted and observed cumulative deformation indicates that the GCN-LSTM model effectively captures the complex relationships between the various factors contributing to cumulative deformation prediction, such as geological, geomorphological, and geospatial data types. This figure highlights the potential of the GCN-LSTM model to serve as a valuable tool for predicting cumulative deformation and the risk of landslides, which can inform decision-making and disaster response efforts.

Fig. 5
figure 5

Location and map of deformation by a DInSAR technique and b GCN_LSTM (Unit: CM)

The absolute error evaluation index was considered to display the error map of predicted and real deformation values. Then, it was divided into eleven intervals to classify the estimation error with a suitable color. As shown in Fig. 6, more than 92% of all PSs points were within the range of less than 4-mm error in prediction, which shows the high accuracy of the results.

Fig. 6
figure 6

Location and map of error for predicted deformation (Unit: cm)

It is also necessary to present the real deformation and the predicted values to evaluate the quality of the performance of the GCN-LSTM. The first step would be to prepare 25% of the total cumulative deformation epochs, which was 16, as a test set from July 22, 2018, to March 19, 2019, to determine the change and similarity trend in the deformation between the output proposed models and the real deformation in the case study. According to Fig. 7, each interval (mentioned earlier) was considered based on absolute error, and a point was randomly selected as a representative point of each interval in the case study. GCN-LSTM predictions follow the real deformation trend, indicating that the proposed model can be used to predict cumulative deformation caused by landslides in real-time. Also, based on the calculated absolute error of predicted and real deformation, eleven intervals were determined for each 4085 PSs point in the study area. Table 5 shows that 3766 points (approximately 92.2%) of the studied area have been predicted with an error of less than 4 mm, of which the majority, 2528 points, have an error of less than 2 mm.

Fig. 7
figure 7

Location and map of represented points for each mean absolute error interval (Unit: cm)

Table 5 Number of PSs points according to absolute error

Based on Fig. 8, the results obtained from this paper using GCN-LSTM for predicting cumulative deformation caused by landslides seem promising. The absolute error of 4 mm for 92% of the points, including P1, P2, P3, and P4 graphs, and 67% of them have an error of less than 2 mm, including P1 and P2, indicating that the algorithm was able to effectively capture the underlying patterns in the data because the algorithm to recognize better the complex relationships between the nodes in the graph, which, in turn, helped in making a high level of accuracy in the prediction. The points' graph, P5, P6, P7, P8, P9, P10, and P11, represents less than 8% of the whole PSs points with more than 5-mm error. Despite this, as seen in these graphs, the trend of predicted deformation followed the real deformation trend, and the approved GCN-LSTM model worked correctly.

Fig. 8
figure 8figure 8figure 8figure 8

Trend's deformation of represented points for each mean absolute error (MAE) interval

6 Conclusions

Considering the spatio-temporal relationship between each pair of points in a particular area (Moio della Civitella), landslide cumulative deformation forecasting was investigated based on a combined DLAs, GCN_LSTM, in this paper. By implementing non-spatial DLAs such as RNN, GRU, and LSTM, we have come up to the conclusion that cumulative landslide deformation is highly affected by spatial interdependency between data points since non-spatial implementation resulted in weaker evaluation metrics compared to the spatio-temporal approach. The proposed DLA (GCN-LSTM) effectively addresses the challenge of incorporating the non-Euclidean spatial dependency between paired points. Also, it considers the time-series characteristics of landslides by monitoring the temporal behavior of data points over the passage of time. The core findings of this study can be summarized as follows:

  1. (1)

    Implementing a technical GCN regression algorithm, the interdependency between each pair of points was captured by non-Euclidean correlation distance over a 4-year timeframe from 2012 to 2016.

  2. (2)

    A new hyperparameter was defined in the GCN approach mentioned above to create the best adjacency matrix by tuning through all hyperparameters simultaneously.

  3. (3)

    The proposed hyperparameter, obtained from the GCN approach, was used to extract the interdependency between each pair of points for two spatio-temporal models, GCN-LSTM and GCN-GRU, on a times series of cumulative landslide deformations from 2015 to 2019.

  4. (4)

    Experimental studies have been conducted on the Moio della Civitella landslide in Italy. In this study, the GCN-LSTM model successfully captured both the spatial and temporal behavior of landslide datasets, providing a promising result for spatio-temporal forecasting of cumulative deformation landslides based on spatio-temporal models. The GCN-LSTM model is more effective at predicting landslides than RNN models, such as the simple RNN, the GRU, the LSTM, and another spatio-temporal algorithm, such as the GCN-GRU.

In summary, our work provides a valuable contribution to the field of landslide prediction by demonstrating the effectiveness of GCN-LSTM in this type of problem. The results suggest that GCN-LSTM is a promising algorithm for predicting cumulative deformation caused by landslides and highlights the potential benefits of incorporating graph structure information into sequential data.

Spatio-temporal models that incorporate various types of features have the potential to make more accurate predictions in a range of applications. Here are a few suggestions for future work with spatio-temporal models:

  • Incorporation of more diverse features: The performance of a spatio-temporal model can often be improved by incorporating more diverse features, such as time series of rainfall data. These features could provide additional context and help the model better to capture the underlying patterns and relationships in the data.

  • Advanced deep learning architectures: Developing more advanced deep learning architectures, such as transformer or attention-based models, could help improve spatio-temporal models' performances. These models could help to capture more complex relationships between the features and the target variable.

  • Multi-modal integration: Another avenue for improvement could be to integrate data from multiple sources, such as satellite imagery, weather data, and ground-based sensors, to gain complete understanding of the conditions leading to a particular event.