Machine learning for landslides prevention: a survey

Landslides are one of the most critical categories of natural disasters worldwide and induce severely destructive outcomes to human life and the overall economic system. To reduce its negative effects, landslides prevention has become an urgent task, which includes investigating landslide-related information and predicting potential landslides. Machine learning is a state-of-the-art analytics tool that has been widely used in landslides prevention. This paper presents a comprehensive survey of relevant research on machine learning applied in landslides prevention, mainly focusing on (1) landslides detection based on images, (2) landslides susceptibility assessment, and (3) the development of landslide warning systems. Moreover, this paper discusses the current challenges and potential opportunities in the application of machine learning algorithms for landslides prevention.


Introduction
Landslides are one of the most critical categories of natural disasters worldwide and induce severely destructive outcomes to human life and the overall economic system [42]. Their existence is ascribed to the geological environment and meteorological processes on earth. Some factors, including lithology, slope morphology, and unplanned urban expansions, can predispose slopes to landslides [28,73]. Severe extreme events caused by climate change, including heavy rainfall and rapid snowmelt, could also trigger landslide occurrences [167]. With climate change has strengthened, the frequency and intensity of landslides are expected to increase rapidly as a consequence.
It is quite urgent to understand landslides to predict their occurrences and behavior, and then to adopt appropriate prevention policies and methodologies. The prevention of an incipient or potential landslide requires the recognition Please note that a preprint version of this paper has been posted on TechRxiv at: http://dx.doi.org/10.36227/techrxiv. 12546098.v1 of the landslide and investigation of landslide-related information. Then, the region where a landslide is prone to occur in the future is predicted. Finally, the anticipation of the character and magnitude of movement may occur [142]. Therefore, common landslides prevention techniques can be divided into two categories: detection and prediction.
In the spatial extent of a landslide, an inventory collected by means of detection is often required. As a common strategy, detection can overcome the limitation of the scale and the location of landslide events to produce detailed inventories that can not only provide a better understanding of the important information about the landslide but also establish a relationship between different factors and the landslides. An accurate landslide inventory can play a part in most stages of landslides prevention, especially in landslide susceptibility assessment.
Landslide susceptibility assessment refers to the possibility of the occurrence of landslides in the spatial dimension. The principle of landslide susceptibility assessment is that a region prone to landslides can be predicted based on available data, including conditional factors and historical landslides. These data are extracted from the landslide inventory. As a static instrument, landslide susceptibility assessment has shown its value in spatial analysis [181]; however, it lacks information on the temporal landslide probability [175].
Reliable early warning systems can be used to predict the short-term behavior of landslides for preventing sudden events. Once emergency warnings are issued, people can take action before the disaster occurrences. An effective approach for achieving early warnings of landslides is to establish quantitative models of landslide evolution processes. The modeling of landslides is based on continuous monitoring of landslide-related variables [188].
Most landslides are triggered by extreme precipitation events [151]. Thus, the rainfall threshold should be regarded as a critical parameter to predict the occurrence of a landslide in the temporal domain. On the other hand, an adequate understanding of landslide deformation mechanisms is essential to develop a reliable early warning system [99]. Landslide displacement is a crucial parameter for judging the condition of the landslide, and rapid changes are generally considered a direct sign of upcoming disasters [127].
The analysis of the aforementioned stages in landslides prevention remains a challenge due to complex geodynamic and microphysical processes. Recently, as an analytics tool, machine learning methods, which can provide predictions, perform clustering, extract association features, and make decisions from given information, are coming to the fore. Various domains have successfully utilized machine learning methods to complete some demanding tasks. Likewise, the landslides prevention domain has begun to apply most major machine learning methods to accurately, efficiently, and effectively solve problems.
At least 80% of machine learning is data preprocessing, which means that the performance of machine learning methods depends on the data quality. With advances in a variety of location-aware sensors and model simulations, available data volumes in the landslides prevention domain are exponentially increasing with increasing spatial, temporal, and spectral resolutions. These continuous cumulative datasets provide applications for machine learning with more opportunities.
Overall, in the context of the spatiotemporal complexity and uncertainties, landslides prevention provides novel opportunities, challenges, and methodological demands for machine learning, which has been a hot research topic in recent years. In the following sections, this paper reviews the development of machine learning methods in landslides prevention. It mainly concentrates on various applications of machine learning in several different landslides prevention stages.
The main contributions of this survey can be summarized as follows.
(1) The paper starts with investigating the applications of machine learning in landslides prevention that comprise landslide detection and prediction. Landslide detection provides inventory data for the other prevention stages. Landslide prediction involves static and dynamic methods to predict landslide occurrences from spatial and temporal perspectives, respectively. (2) This paper points out the potential challenges and limitations of machine learning in landslides prevention and proffers several strategies that have been utilized in other research domains to overcome or circumvent them. Moreover, this paper discusses the opportunity for the emergence of challenges, summarizes and recommends a few of the most promising deep learning methods that have been applied in other domains, and envisages their possible applications in landslides prevention. (3) This paper advises combining data-driven machine learning with knowledge-driven landslide mechanisms to interpret machine learning results.
The rest of this survey is organized as follows. Section 2 briefly introduces machine learning and landslides prevention. Section 3 surveys machine learning applications in landslide detection. Section 4 surveys machine learning applications in landslides susceptibility assessment. Section 5 surveys machine learning applications in landslide warning systems. Section 6 outlines major challenges and opportunities for using machine learning for landslides prevention. Finally, Sect. 7 concludes this survey.

Background
For the sake of clarity, this section will briefly introduce the background concepts of machine learning and landslides prevention.

Conventional machine learning methods
As an analytics tool, one of the major objectives and tasks of machine learning is to build a model to represent complex, unknown, or incompletely understood relationships between data and target variables [79]. There are slight variations in the types of machine learning algorithms; and the machine learning algorithms can be roughly divided into two major categories according to their purpose: supervised learning and unsupervised learning algorithms. Supervised learning refers to building a model for connecting known inputs to unknown outputs. Consequently, the output values for new data can be predicted based on those relationships learned from the previously labeled training data [16]. Supervised learning can be divided into classification and regression problems. In classification problems, the intended output is a semantic label or class. For example, to identify potential landslides, classification problems would label each pixel in an image as ''landslide'' or ''non-landslide''. Regression problems aim to predict a continuous variable.
Common supervised learning algorithms include the logistic regression (LR), decision tree (DT), support vector machine (SVM), Naive Bayes (NB), artificial neural networks (ANN). Each single learning algorithm can be considered as a base learner.
Although base learners behave well, it is necessary to improve the performance further (e.g., the classification, prediction, or function approximation) of a machine learning model. Since more powerful learners are constructed from a set of base learners, multiple learner systems (i.e., ensemble learning) have gradually gained much attention. Three representative ensemble methods are bagging, which can decrease variance, boosting, which can decrease the bias, and stacking, which improves predictions [30,141].
Unsupervised learning methods try to find patterns in unlabeled data. One of the most common unsupervised learning algorithms is clustering, where samples are grouped based on similarity. Another typical method is dimensionality reduction that aimed at reducing the variance in a dataset and remove outliers.
Brief introductions to these methods are listed as follows.
-LR: A supervised learning algorithm that uses a logistic function to map the input variables to categorical dependent variables. -DT: A supervised learning algorithm is commonly used in classification problems. The structure resembles a tree. The branch node represents several alternatives. Each leaf node represents a decision. -SVM: A supervised learning algorithm is also commonly used in classification problems by constructing a separating line to distinguish between objects in a multidimensional space. -NB: A supervised learning algorithm is based on Bayes' theorem and widely used in classification problems, which assumes that features are independent and have no correlations. -ANN: ANN consists of a set of connected processing units that work together, can found an association of patterns among input and output. -ELM: ELM is a feedforward neural network that can be used for classification, regression, clustering, sparse approximation, compression, and feature learning. The parameters of its hidden nodes need not be tuned. -kNN: A supervised learning algorithm uses ''feature similarity'' to predict the values of new data points, in which the new data point will be assigned a value based on the distance it matches the points in the training set. -K-Means clustering: An unsupervised learning algorithm divides all input data into k clusters, in which data in the same cluster are as similar to each other as possible. -Boosting: An ensemble method can train weak learners sequentially, each strives to correct its predecessor. -Bagging: An ensemble method applies the so-called Bootstrap statistical method to a high-variance machine learning algorithm. As typical bagging, RF can be structured from multiple decision trees. -Stacking: An ensemble method can combine models of different types.

Deep learning methods
As a subdiscipline of machine learning, deep learning is an extension of ANN. Deep learning uses multilevel deep neural networks to extract features from the raw input progressively. The scale and complexity of the networks is the major difference between deep learning and traditional ANN.
A multilayer deep learning neural networks consists of some input layers, some hidden layers, and then the output layer. After loading input data into an input layer, hidden layers receive a group of weighted inputs and implement nonlinear transformations, and provide the output through activation functions.
A deep learning architecture is a multilayer stack of simple modules, all or most of which are subject to learning, and many of which compute nonlinear inputoutput mappings. Each module in the stack transforms its input to increase both the selectivity and the invariance of the representation. With multiple nonlinear layers, a system can implement extremely intricate functions [93].
Similar to conventional machine learning methods, common deep learning methods can be predominantly classified into two categories: supervised and unsupervised learning methods. Different categories have different architectures, which allowed them highly flexible.
CNN is a typical supervised deep learning method that achieves the best predictive performance in areas such as speech and image recognition by hierarchically composing simple local features into complex models. CNN can extract and classify features from high-dimensional data. As a variation of a multilayer perceptron, common CNN consists of one or more convolutions, max pooling, and fully connected layers [154].
The input layer is a m Â n matrix in which every cell has a feature value. Each convolutional layer consists of several convolutional units, and the parameters of every unit are optimized by a back-propagation algorithm. The purpose of a convolutional manipulation is to extract different features from the input layer [138]. The first convolutional layer may only extract some low-level features such as lines, edges, and corners. Additional convolutional layers can iteratively learn more intricate representations from low-level features. Pooling is a critical manipulation technique in CNN [162]. It is a form of downsampling to reduce the dimensionality of feature maps, without altering the depth of these maps.
Since the initial development of CNN, multiple CNN architectures have been created. Some notable examples include: VGGNet [155], ResNet [54], Inception [163], and DenseNet [66]. Each of these networks employs the same structure of convolutional layers and feature extraction but may vary in the number of layers they have, feature mapping, and efficiency [62].
RNN is mainly viewed as a supervised learning method and can be used for processing sequential data. RNN remembers the past, and their decisions are influenced by what they have learned in the past. RNN is made up of nodes, and the process of after being fed data, it outputs the result back into itself is repeated. This process allows the analysis of dynamic changes over time, where persistent information is needed [174].
LSTM is a special RNN architecture that inherits RNN's advantages of sequence learning and is able to learn timeseries data with long temporal dependency [144]. With its memory block structure, LSTM models can judge whether the learned rules from the previous time step are useful or not and then determine whether the learned rules should be passed along to the next time step or abandoned. The prediction accuracy is thus not affected by the errors in some previous points.
Unsupervised learning algorithms are used to train each layer one at a time, independently, while using the previously trained layer as the input. After the pretraining step is performed on each layer, a fine-tuning step is performed on the whole network using supervised learning [62]. Common unsupervised networks include autoencoders, DBN.
Autoencoder trains a neural network so that the input and output become the same. In the same way as with general neural networks, the weights of the network are learned by stochastic gradient descent [192]. An autoencoder can extract features from unlabeled data using only a few layers. The network is symmetrical from the input to the output for dimensionality reduction and feature extraction [140,193]. An autoencoder is capable of transforming raw data into sparse and nonlinear correlated features. Using the shallower hidden layer to obtain the optimal feature representation reduces not only the training loss but also the network complexity and network error, which reduces the amount of computation and thus speeds up the operation [65] (Figs. 1, 2, 3, 4, 5, 6, 7, 8).

Overview
A geohazard is a devastating phenomenon that is directly and indirectly caused by activity in the earth's interior or geological environment changes, including human activity or climate change. As one type of global geohazard, landslides are geological phenomena related to ground movements of rockfall and debris flow and can refer to the movement of a mass of rock, debris, or earth down a slope under the influences of gravity, rainfall, and earthquake [176]. Lithology, tectonics, climate change, and anthropogenic pressure may cause slope instability that could progress to landslides [44,73,159]. Heavy rainfall, rapid snowmelt, or earthquakes could also trigger a landslide occurrence. Landslides are ubiquitous in any terrestrial environment with slopes.
In most cases, landslide occurrence means catastrophic results.it has brought out the massive destruction of infrastructure and even thousands of fatalities every year [42]. From 2004 to 2010, 2620 fatal landslides were recorded, causing 32,322 fatalities [126]. At least 17% of all natural-hazard fatalities around the world can be attributed to landslides [131]. In the most affected areas, financial costs and countermeasures are on the order of billions of dollars [85].
Recently, as a consequence of human disturbance (e.g., deforestation, mineral mining, and intensive exploitation of  land for construction) and extreme weather, the frequency and intensity of landslides have increased dramatically.
With the advent of extreme natural events, the prevention of landslides has become an urgent task. landslides prevention involves an assessment of slope instability phenomena and the change in the occurrence of slopes by means of effective geological engineering principles and other existing and emerging technologies. landslides prevention can provide valuable information for government agencies, planners, decision makers, and local landowners to make emergency plans that reduce the negative effects on economics and human life. Typically, the study of landslides prevention is divided into two aspects: detection and prediction.
Related datasets for landslides prevention are generally obtained from three sources: (i) remotely sensed data acquired by Earth-observing satellites, (ii) data collected by in situ sensors, and (iii) data collected during fieldwork.

Landslide detection
Fast and errorless detection of landslides is vital for rapid damage assessment and supporting disaster management activities and simultaneously increases the efficiency of disaster mitigation.
In general, landslide detection refers to identifying potential landslides and understanding fine-scale landslide patterns. It is essential to quickly and accurately extract landslide information, especially in response to emergencies. Nevertheless, conducting field investigations of large landslides, especially for landslides that have just taken place, is rather dangerous and difficult [187]. Because of the risks in a field survey and the vastness of a disaster area, this type of investigation, which requires a large workforce and many material and financial resources, will be difficult to carry out. It is necessary to utilize some emergent techniques for automatically detecting landslides to avoid these disadvantages.
On the basis of detecting landslides, event-based landslide inventories should be generated as soon as possible; these inventories can provide baseline information (e.g., landslide types, location, magnitude, distribution, and boundaries) and depict the association between landslides and a single conditional factor [43]. In general, landslide inventories will be advantageous to understand the causal factors involved and to predict landslides.

Landslide prediction
Significant human and economic losses push worldwide research for predicting future landslide events. Most landslide predictions follow a simple principle: the past and the present are the keys to the future. The analysis of past and current landslides will help in estimating landslide behaviors, frequencies, extents, and consequences in the future under certain conditions, which means that the spatial and temporal occurrence probability of landslides must be quantified [8]. Landslide susceptibility assessments are a static approach used to assess where landslides are most likely to occur in the future. Landslide early warning systems focus on information on the temporal landslide probability.

Typical data source for landslides prevention
In recent decades, the development of satellite, airborne, and ground-based remote sensing techniques has improved the ability to collect data, which mainly includes the sigmoid function is applied to the last fully connected layer and outputs two real numbers between 0 and 1 indicating the probability belonging to landslide and non-landslide [75] Shuttle Radar Topography Mission (SRTM), Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), or Light Detection And Ranging (LiDAR) instruments that characterize terrain morphology. Commonly collected data include visible imagery [123,160], LiDAR data [72,149], and Synthetic Aperture Radar (SAR) and Interferometric SAR (InSAR) data [17,52], which can be acquired from platforms such as Google Earth Engine (GEE), the United States Geological Survey (USGS) Earth Explorer and Copernicus. GIS technology offers a platform to integrate spatial information from these disparate sources into a single framework and to use these data for related analyses of landslides. The performance of neighborhood operations Fig. 4 Examples of machine learning applications in landslide detection tasks. A pixel (a) [15] could be classified accordingly as ''landslide'' or ''non-landslide''. The first step when using objectbased approaches is segmenting images basing on homogenous features, such as shape, color (b) [7], and scale (c) [57]. Then objects are classified (e) [57] (d) [7]. In deep learning, CNN autonomously extracts the contextual features of an image dataset and learn to identify landslide features (f) [13] by looking at R, G, B channels (g) [145] Neural Computing and Applications (2021) 33:10881-10907 10887 with GIS allows the extraction of morphometrical and hydrological parameters from DEM [25]. DEM is a common type of data used in spatial analysis and are generally obtained by computing data derived from airborne or satellite sources.
In addition, mountainous regions have been monitored in various ways. A typical monitoring method is to embed different kinds of ground-based sensors related to slope movement, water table level change into the slope, and sense the dynamic change in signals. For example, local monitoring sites for landslides have been established by the USGS [2]. Rain gauges, slope movement sensors, and soil moisture probes can be used for monitoring. Wireless sensor networks are being used to achieve large-scale data collection and transmission. By employing different sensing and monitoring techniques, multidimensional and multiscale temporal and spatial data can be collected.

Overview
Landslide inventory maps produced by image detection are usually the first step for training and testing in landslide forecasting studies [29], which means that fast and accurate landslide detection is useful for understanding landslides in a large area and predicting them in the future [100].
In detection, timely information about landslide positions, areas, and destruction levels is extremely important. With the development of satellite remote sensing technology, available datasets gradually accumulate, which can be mined for crucial information.
Previous traditional landslide image detection methods include pixel-based approaches [29] and the Object-Based Image Analysis (OBIA) [115]. To dispose of immense datasets and enhance their performance, many machine learning algorithms have been gradually introduced for these datasets. Conventional algorithms such as SVM [18,63,122], ANN [74,105], and RF [51] were relatively widespread in the early years.
Recently, deep learning methods have begun to be used for solving image classification problems. For example, CNN can complete the classification of satellite imagery without requiring any preprocessing or feature extraction process [111].

Image data source
Remote sensing technology has proven its effectiveness in generating landslide inventories. Inventories are crucial for detecting landslides after triggering events, especially for remote or barely accessible areas [70]. Multitemporal satellite imagery has been used to qualitatively analyze the temporal effects of phenomena and quantify the changes. The available data have become a major data source for landslide detection because of their high temporal frequency, a digital format suitable for computation, synoptic view, and a wide selection of spatial and spectral resolutions [114].
Except for satellite images, other available datasets for detecting landslides include bitemporal aerial photographs [98,100], LiDAR [22,47], and InSAR. Common landslide detection, according to these datasets, is as follows. (i) Optical image data (e.g., spaceborne or airborne remote sensing data) can be used for analyzing the image features of a landslide and then for recognizing the extent and location of the landslide by visual interpretation or automatic extraction methods. (ii) Radar data (e.g., SAR, InSAR, LiDAR) can be used for detecting surface deformations and deposition resulting from landslides.

Common methods for detection
The methods for detection based on image classification mainly include two categories: (1) pixel-based and (2) object-based methods.

Pixel-based methods
An image pixel is the analytical unit of an image, which is generally assigned a value based on the detected electromagnetic energy, whose spectral characteristics are exploited to detect and measure changes without considering the spatial context [68].
By defining thresholds on pixel values, the pixels could be categorized into classes based on the spectral signature. Then, individual objects (pixels in a pixel-based approach) are allocated to the most likely class. For instance, a pixel could be classified accordingly as ''landslide'' or ''nonlandslide'' by defining thresholds for pixel values in the green and red bands [71].
In the context of land cover classification and landslide detection, a variety of pixel-based studies have been conducted using different machine learning methods and earthobservation datasets. Pixel-based image classification can be treated as a typical binary classification problem, or a post-classification comparison can be performed to measure changes.
The ANN approach can provide better classification results than other methods when the classes are not normally distributed [107]. When applied to stacked multitemporal images, the SVM algorithm learns from training data and automatically finds threshold values from the spectral features [15]. For some fine-resolution images (e.g., 1 or 2 m), a Feedforward Neural Networks (FFNN) with one hidden layer and sigmoid transfer function can execute the classification [125].
Pixel-based methods often require extensive parametric tuning and precise geometrical correction or coregistration. Noise and outliers have significant effects on the accuracy of these methods [145].

Object-based methods
OBIA is another detection method that can group neighboring pixels in regions before conducting the classification to address the limitations of pixel-based methods in high-resolution imagery. In contrast to pixel-based approaches, OBIA allows the integration of several landslide diagnostic features, such as spectral (e.g., pixel values, tones, and colors), spatial (e.g., object sizes, shapes, and patterns), and contextual features [116]. Since spectral, spatial, contextual, and morphological parameters can all be taken into account, OBIA has been employed in An autoencoder consists of an encoder and a decoder. The encoder compresses the input data into a fewer (two shown here) dimensions in the so-called bottleneck layer. The decoder tries to reconstruct the original input from the compressed data in the bottleneck layer [38]. Driven by data rather than prior knowledge, the autoencoder is capable of transforming raw data into sparse and nonlinear correlated features [65] Neural Computing and Applications (2021) 33:10881-10907 10889 landslide detection from airborne and satellite data, including multitemporal data [115,161]. Similar to other fields, machine learning methods can be used in conjunction with OBIA to select more appropriate features from VHR imagery. Some applied features include slope, local surface roughness [37], plan curvature [37,157], slope and curvature together [91]. The most widely used algorithms include the kNN [39], RF [135] and SVM [36].
Selecting an appropriate machine learning classification method is important and requires taking into account many factors, including the spatial resolution of the data used, diverse data sources, and data types. Commonly, most machine learning steps are applied in OBIA as follows.
(i) The image is segmented into homogeneous regions composed of similar pixels. (ii) Objects are classified into sets of features related to their spectral, spatial, and contextual properties [135]. In CNN architectures, the landslide inventory is the input data. It can be regarded as a picture. Here, each pixel integrates information of different several influencing factor attributes. Layers of maps that converted from each factor represent a channel, respectively [179]. In RNN architectures, W xh refers to weights for the connection of the input layer to the RNN layer (i.e., the hidden layer). U refers to the weights of connection in the RNN layer. W hy refers to the weights for the connection of the hidden layer to the output layer [180] For example, Stumpf et al. [161] utilized an RF-based method to evaluate the capability of a broad set of object metrics (color, texture, shape topography, and their sensitivity to changing scales of the image segmentation) and consequently removed nonrelevant features. Furthermore, they used RF-based methods to evaluate the consequences of class imbalance at each test site with different segmentation scales.
Pukar et al. [7] first used K-means clustering to obtain the Normalized Difference Vegetation Index (NDVI) threshold. Then, landslide candidate objects were separated from other vegetated surfaces. Finally, multitemporal landslide occurrence inventories were created. These inventories can be used to develop a landslide susceptibility model.
Both pixel-based and object-based methods use the spectral information of neighboring pixels or morphological properties as features, which usually require empirical parameters. Therefore, the detected spatial features strongly depend on experience and the parameter settings.

Deep learning for landslides detection
Due to its hierarchical learning framework, deep learning methods are capable of extracting robust spatial and spectral features. It can learn the combination of satellite imagery and topographical features automatically and generalize the high-level features from low-level features layer by layer to best discriminate landslides from other objects [145,196].
Recently, in the field of remote sensing, CNN is currently revolutionizing object detection and pattern recognition [112]. After considering multispectral bands (e.g., red, green, blue, red-edge, and near-infrared) channels, CNN enables the efficient analysis of topography, which in turn can reveal the unique spectral signatures and unique shapes required for landslides. CNN autonomously extract the contextual features of an image dataset and learn which landslide features are relevant for assigning the observations to the specified categories [80].
Using post-landslide images, Yu et al. [191] designed a CNN model that automatically learns and portrays the image data by extracting the discriminant area, boundary, and the center of the landslides as features. Thus, the landslide images can be retrieved from the test image set to achieve the objectives of feature detection and classification. The results demonstrate that the method does not need to manually design features and can achieve the intelligent detection of landslide disasters.
In other cases, both pre-landslide and post-landslide images have been utilized to detect texture changes. Ding et al. [31] removed landslide-irrelevant areas, including vegetation, water, and buildings in post-landslide images, and then built a CNN model that uses the concentration of landslides to extract with a high degree of change and consequently detected changes in image patches. The change degree is calculated by the Euclidean distance between patches in the same area in the pre-and postlandslide images. Finally, the unique texture features of the landslides were used to detect changes for accurately extracting the areas of landslides. These methods perform better at reducing the search areas and making the time period clearer than others.
The selection method of the sample patches plays a critical role in patch extraction by CNN. Based on the optical data from the RapidEye satellite, Ghorbanzadeh et al. [45] trained the same CNN architecture using different sample patches extracted from both random and central approaches and compared the detection results against a precise inventory dataset of landslide polygons through the mean Intersection-Over-Union (mIOU). The patches selected based on only the central areas of any landslide show an improvement in the resulting accuracy for landslide detection.
The aforementioned CNN architectures are simple and generic. The network depth is shallow and generally has 5 $ 8 layers. Other variants of classic networks, including the adjustments to the width or depth and the introduction of depthwise separable convolutions or group convolutions, achieve slight and trivial improvements, thus showing slight performance differences in landslide detection.
Ji et al. [75]   strategies, reduction ratios in the multilayer perceptron, and placements of the attention module and designed a novel attention module that produced a 3D spatial and channel attention feature map. The 3D novel module emphasizes some parts of the input feature maps and suppresses the other parts in a backbone CNN; in other words, it emphasized foregrounds (i.e., landslides) and resisted noisy backgrounds (i.e., non-landslides). The experiment indicated that the attention module considerably strengthened the CNN structures, especially the ResNet-50 architecture.
In contrast to other networks, such a ResNet architecture can deepen networks.
Maher et al. [145] detected landslides using ResNet architectures based on spectral bands and topographical variables. The results indicated that the architectures perform better by adding more convolutional or dense layers to the network.
In some cases, CNN employs global pooling and presumably misses some spatial information of images, thus resulting in low recognition accuracy. In general, landslide areas include serious spatial uncertainty, resulting in difficulty extracting effective landslide features. A model can be produced that combines data from various modalities. Such models can learn abstract representations and merge them from different features.
Tao et al. [98] designed an asymmetric Fully Convolutional Network within Pyramid Pooling (FCN-PP) that is able to balance the use of context and localization accuracy, taking into account the combined features from multiple convolutional layers. After capturing five pairs of bitemporal images by using an aerial survey camera system, five areas including different types of landslides were obtained through cropping. Using these images, the proposed pyramid pooling model combined multiscale scaled features to provide a suitable feature representation of landslide areas. The final output includes changed and unchanged areas to achieve a binary classification.
4 Machine learning for landslides susceptibility assessment

Overview
Landslide susceptibility expresses the likelihood of a landslide event occurring in a given area based on local terrain conditions or climate conditions. It usually partitions the geographical surface into zones of varying grades of stability based on the landslide inventory [60]. The resulting output is a solely spatial distribution of the predicted categorized hazard probabilities across grid cells [120].
Machine learning methods applied for landslide susceptibility assessment represent a structured gathering of the available information extracted from landslide inventories, process/model with that information, and form a judgment about it in a transient workflow. This workflow unfolds through stages of preprocessing, implementation or modeling, and postprocessing, wherein modeling plays an essential role.

Workflow of a machine learning in landslides susceptibility assessment
Supervised learning is by far the most widespread form of machine learning applied in landslide susceptibility assessment. The following are details about the workflow of supervised learning applied in landslide susceptibility assessment.
Initially, high-quality spatial data are collected from remotely sensed images or real-time monitoring for a landslide to produce landslide inventories [32]. A landslide inventory includes historical landslide data and other related information, such as geological data, meteorological conditions, and topographical data, which can roughly clarify the relationships between predisposing factors and landslide occurrences [109]. Based on these data, the predictive models for landslide susceptibility zonation can construct the relationships between the input and output variables [50]. Prior to any prediction modeling, these two types of variables should be identified. Commonly, the output consists of landslides and non-landslides. The input relates to conditioning factors of landslides.
Redundant or irrelevant factors may create noise, decreasing the overall predictive capability of the models. It is essential to choose suitable factors in landslide susceptibility assessment. Thus, the optimum conditioning factors for a study area will be subsequently determined, in which prior knowledge that refers to the analysis of the characteristics and geo-environmental conditions of the study area in relation to past landslide occurrences is needed.
To date, no universal guidelines have been agreed upon for the determination of case-specific conditioning factors [82]. Landslide conditioning factors show variation with respect to the study area and its geographical locations. Every study area has its own particular set of factors that cause landslides [182]. According to numerous studies, common landslide causal factors can be divided into two categories: (i) internal factors, which are related to geology and topography, such as the elevation, profile curvature, slope, plan curvature, distance to faults, aspect, distance from rivers, landform and lithology; and (ii) external factors, which usually cause landslides, such as rainfall, distance from roads and the seismic intensity.
To further select the appropriate input factors, one effective method involves ranking the importance of the input variables. Popular algorithms include ReliefF [109], Genetic Algorithms (GA) [33], Information Gain Ratio (IGR) [168], and symmetrical uncertainty analysis [137]. Through calculating a score for each factor, these algorithms can evaluate and rank the contributions of landslide causal factors, and the factors with lower contributions are sequentially removed. Furthermore, machine learning methods can rank these factors by their weights. The most frequently used supervised learning methods include LR [5], ANN [129], SVM [61], NB [171], and DT [170]. Unsupervised learning methods such as cluster sampling can evaluate factors by weighting the relative importance of each conditioning factor [118].
The predictive model is trained. The performance of the models is usually measured through some kind of cost function. It is also important to optimize model performance. This entails the adjustment of hyperparameters that control the training process, structure, and properties of the model [148]. For example, a validation dataset is separated from the test and training sets using sampling strategies. The generic approach that was selecting the training sets is usually made by sampling 70% of all instances randomly throughout the available data. The remaining part is reserved for testing the model.

Conventional machine learning methods for landslides susceptibility assessment
Conventional machine learning algorithms have been applied to landslide susceptibility assessment and achieve outstanding performance and are mainly classified into single base learning algorithms and ensemble learning algorithms.

Single base learning algorithms
The most frequently traditional single algorithms applied for landslide susceptibility assessment include (1) LR, (2) SVM, (3) DT, and (4) ANN [26,77,81,143]. LR has a long tradition of application in landslide susceptibility assessment [89,95]. A study proved that the predictive model complexity and the size of the training dataset influence the accuracy and predictive power of LR models concerning landslide susceptibility [56].
SVM can identify the optimal boundary between the training data from two classes [27]. Compared with other algorithms, the SVM algorithm achieves slightly better accuracies in shallower landslide assessment applications [113,134,189]. The quality of the predictive results produced by the algorithms is correlated with the input data quality.
As an original tree-like structure, DT can reveal some important relations between causal factors and landslides but yield landslide susceptibility models with lower accuracy than other methods. For instance, given the slope and altitude as input variables, a DT can reveal that slope is more important than altitude [69].
The standard ANN model comprises three layers, namely an input layer (i.e., landslide conditioning factors), hidden layers, and an output layer (i.e., landslide susceptibility) [96]. A case proved that ANN applied in landslide susceptibility assessment achieved fairly precise models [97].
In summary, several drawbacks are usually identified when utilizing the aforementioned single base learning algorithms, such as overfitting and unstable performance.

Ensemble learning algorithms
Generally, ensemble learning algorithms can enhance the performance of the single base learning algorithms and improve the robustness and generalizability.
A commonly used ensemble algorithm in landslide susceptibility assessment is RF [19,24,169,190]. Usually, an RF model has a more predictive capability to identify landslide susceptibility zones than other models [46]. Since the random selection of the training dataset may affect the results of the model, a set of many trees helps to ensure the stability of the model [24].
For example, Hong et al. [59] indicated that three ensemble models (i.e., AdaBoost, bagging, and rotation forest) could significantly improve the performance of J48 DT as the base learner, and rotation forest can be considered a promising method for landslide susceptibility mapping in similar cases with better accuracy than other methods.
Other ensemble methods have been developed for landslide susceptibility assessment, including GBDT [21,158], Random Subspace [109], Multiboost [128], and Regularized Greedy Forests [146]. These ensemble methods can reduce both the bias and variance and avoid overfitting problems compared to the base classifiers to improve predictive capability. The main advantage of numerous ensemble models is that model construction is not complicated, and the training time was short with the data at hand.
Dou et al. [34] produced four classes of ensemble learning models (SVM-stacking, SVM, SVM-bagging, and SVM-boosting) using SVM as the base learner. The study suggests that an ensemble machine learning model does not necessarily mean excellent performance. It is better to prudently select the proper model or the base learner for the predictive models. Each leaner should be seriously considered before grouping it into the ensemble models. In addition, the interpretation of the ensemble methods is more complicated than that of the base models [59,86,128,146].
In conventional machine learning applications to landslide susceptibility assessment, the correlations between factors should be first eliminated to reduce model noise [23]. Furthermore, conventional feature engineering involves a substantial amount of prior knowledge in the process of seeking the proper parameters and thresholds of each feature. These empirical methods mean that the inherent and deep features of input datasets are challenging to extract [10,194].

Deep learning methods in landslides susceptibility assessment
Recently, with the rapid development of deep learning, state-of-the-art learning approaches have been successfully applied in landslide susceptibility assessment in the field. Indeed, deep learning has also been commonly applied to feature extraction [62]. Deep learning can find optimal features and handle indirect relationships between features and goals and can thus simplify the feature engineering and data preprocessing steps.

Autoencoder
When applying an autoencoder, landslide/non-landslide labels and linear/nonlinear correlation assumptions are not needed [40]. In general, an autoencoder neural network for landslide susceptibility assessment consists of input layers for raw feature dropout, hidden layers for sparse feature encoding, output layers for sparse feature extraction, and classification for prediction. The autoencoder is trained to reconstruct the input of the landslide-influencing factors onto the output layer for feature extraction and dimensionality reduction. The methods prevent the simple copying of the data and the network [121]. Maher et al. [146] used an autoencoder as an optimized factor to learn features from a dataset in an unsupervised manner [58]. They indicated that this factor optimization strategy based on unsupervised learning improves the performance of tree-based landslide susceptibility models by reducing the dimensionality. However, the strategy requires additional experiments and statistical analysis.

CNN
As a typical deep learning method, when a CNN is applied in landslide susceptibility, whole landslide inventories can be regarded as an input matrix in which each pixel has several landslide-influencing attributes. That is, each condition factor layer can be a channel.
After selecting landslide-influencing factors, Wang et al. [179] constructed three CNN architectures (i.e., CNN-1D, CNN-2D, CNN-3D) to produce landslide susceptibility maps. More detailed results of the three architectures are as follows.
(i) CNN-1D could exploit the local correction and gradually learn more intricate representations form factor vectors to directly extract the information from landslide influencing factors and landslide susceptibility analysis. (ii) After converting a one-dimensional input grid cell (vector) comprised of different attribute features into a two-dimensional matrix, CNN-2D can extract the valuable hidden features to reflect the probability of a landslide occurring. The output is divided into two classes: landslides and nonlandslides. (iii) CNN-3D not only learns factor representations but also extracts local spatial information. Specifically, the CNN-3D architecture extracts the influencing factor information and spatial relations to predict the probability of landslide occurrence.
The results indicated that CNN can effectively extract spatial information using local connections and can significantly reduce the number of network parameters by sharing weights; they can be used to produce reliable landslide susceptibility maps. Other related studies were scarce. Therefore, more research to verify different CNN architectures for landslide susceptibility assessment is needed. Similar to other deep learning methods, CNN confronts a generalization problem: both underfitting and overfitting results in poor performance of classification models. As an optimization algorithm, gradient descent is acceptable and has been commonly used to tune CNN to minimize the cost function.
Pham et al. [130] selected a Moth-Flame Optimization (MFO) algorithm as a replacement. The regularization technique was applied by defining the search boundary (lower bound and upper bound) of the MFO algorithm to prevent the model from taking extreme values for connecting weights. In nature, moths move in groups in a spiral path towards a light source. Their positions can be modeled in dimensional space. In CNN applications, the hyperparameters of filters and weights of the fully connected layer were considered the dimensions of the moths, and the model searched for the best moth (i.e., the one with the smallest RMSE). Finally, the performance of the CNN model applied in landslide susceptibility assessment has been improved to some extent.

RNN
Another type of deep learning is RNN, which can use internal memory units to process arbitrary sequences of inputs. As a complicated evolution process, the formation and occurrence of landslides practically respond to the interaction of these multiple instability factors. The recurrent structure of the RNN will contribute to retaining the most critical information involved in a landslide occurrence and pass it to the next hidden state.
Wang et al. [180] sorted each landslide-influencing factor layer in descending order of importance. Landslide influencing factors are viewed as a single-band image, and each pixel is converted into a sequential sample according to importance. Accordingly, the most important factors are sent to the RNN structure first, and the least important factors are sent last.
To express the complex relationships between landslide occurrences and continually changing factors, Xiao et al. [184] proposed a novel and dynamic model that can remember historical data using memory blocks. After collecting data and extracting features from the DEM, high-resolution remote sensing images, geologic maps, and meteorological data from January 2015 to December 2016, they built an LSTM model to solve the landslide susceptibility classification problem.

Overview
Reliable early warning systems are a reasonable approach for risk reduction and can significantly reduce economic losses and casualties [88]. These systems are designed to predict the short-term behavior of single landslides according to information, including the potential triggers and their thresholds for issuing emergency warnings and the sliding mechanism [35,49]. Rainfall is the most widespread and frequent trigger of landslides around the world [84]. Generally, landslide early warning systems are often based on the assumption that a critical rainfall amount exists and, when reached or exceeded, triggers shallow landslides. For estimating the minimum critical rainfall amount, a widely used method is the determination of a rainfall threshold. The rainfall threshold identifies the boundary that divides rainfall data into two categories, namely rainfall that induced a landslide and rainfall that did not induce a landslide and thus can be mainly implemented to predict the temporal occurrence of landslides [108,150].
In the case of a deep-seated landslide, when rainfall is intense, the infiltration rate exceeds the deep drainage rate, triggering instability conditions. An increase in the level of groundwater often induces landslides [49], which can be another important factor. It is necessary to evaluate fluctuations in the groundwater level accurately based on previous rainfall to predict the occurrence of deep-seated landslides [108]. The groundwater level is directly related to landslide displacement, which is a complex nonlinear dynamic process.
Predicting landslide displacement also plays an important role in operational early warning systems, as it will offer more detailed information for early warning systems than triggers [64,101,153]. Commonly, the sliding mechanism of landslides results from combining local geological and other external influencing factors. The dominant factors are precipitation and fluctuations in the groundwater level [156]. As a major result of the sliding mechanism of landslides, displacement is a key parameter to capture the interaction between landslide deformation and hydrometeorological conditions [11]. Their rapid changes are generally considered a direct sign of an upcoming landslide. Thus, to produce a new model for the prediction of landslide displacement, both the deformation mechanisms and the influencing factors should be taken into account.
Machine learning has been applied to the above-mentioned various aspects of landslide warning systems, including classifiers for the analysis of rainfall thresholds themselves, cooperation between rainfall thresholds and landslide susceptibility assessments, determination of rainfall thresholds in deep-seated landslides as an inverse analysis and prediction of landslide displacement.

Analysis of rainfall threshold
Common early warning systems need to take into account both landslides triggered by short and exceptionally intense rainstorms (e.g., shallow landslides) and landslides triggered by exceptionally prolonged rainfalls (e.g., deepseated landslides). The decisional algorithm at the core of the warning system is based on the comparison between the thresholds and the rainfall data (recorded and forecasted) [150]. Rainfall data are collected from rain gauges. Most predictive models define rainfall thresholds based on the analysis of past rainfall events that have resulted in landslides.

Analysis of rainfall threshold in shallow landslides
Defining rainfall thresholds requires defining a linear frontier between two categories. As a widely used twoclass linear classifier, SVM has been used to determine rainfall thresholds for shallow landslides [124,136,173]. However, a single threshold cannot predict the exact localization of the landslides. For enhancing the spatiotemporal forecasting capability of a regional-scale landslide early warning system, a reasonable approach is to integrate rainfall thresholds into landslide susceptibility assessments.
Pradhan et al. [133] estimated the combined probability of landslides and triggering rainfall thresholds using a hazard matrix. This study is described in detail as follows.
(i) Based on information on critical rainfall intensities and durations extracted from historical landslide data, rainfall threshold warning levels were prepared. (ii) A backpropagation ANN was used for landslide susceptibility assessment. (iii) Rainfall threshold warning levels and the classified shallow slide susceptibility were combined using a matrix table. After application to a practical example, the model provided reasonable results.
After complete landslide susceptibility assessment using an RF model, Segno et al. [150] integrated the results into statistical rainfall thresholds and assessed regional-scale forecasting of landslides. This study showed that the coupling of the two methodologies enhanced the forecasting effectiveness of the warning system.
Kirschbaum et al. [84] developed a near-global Landslide Hazard Assessment Model (LHASA) that combines surface susceptibility and satellite rainfall data to provide a qualitative landslide prediction. According to the DT structure, if the susceptibility values are moderate to high or very high, nowcasts are issued.
On the other hand, deep learning methods can extract features from raw rainfall data to predict a landslide. For example, Huang et al. [67] built a DBN model to train a large number of unlabeled data in an unsupervised way. The rainfall data comprised the average annual rainfall, daily rainfall, and cumulative precipitation from the previous seven days. The DBN was stacked by a series of RBM. The softmax classifier was added to the top layer of the DBN. The dropout mechanism is introduced in the RBM hidden layer structure to sample the node weights of the hidden layer with a probability of 50% to prevent overfitting. There are four labels (minor landslides, medium-sized landslides, large-scale landslides, and huge landslides) according to the disaster degrees and scales of the landslides in the data samples.

Analysis of rainfall threshold in deep-seated landslides
Generally, the mechanisms that induce shallow landslides and deep-seated landslides are different. Rainfall can result in rising groundwater levels. Elevated pore water pressure, induced by the change in the groundwater level, is one of the main triggers of deep-seated landslides [14]. Their interactions are complicated. For predicting deep-seated landslides, the groundwater level should be considered a relevant parameter. To explore the relationship between rainfall and groundwater level, Qing et al. [108] developed a rainfall threshold model using a genetic algorithm SVM and a backpropagation neural network based on the antecedent rainfall data for forecasting variations in the groundwater level caused by rainfall. Then, to determine the critical threshold of the groundwater level that could reactivate the landslide, numerical computations were conducted under different values for monitoring and predicting the groundwater level. The rainfall threshold was determined in reverse. In this study, the factor of safety acquired from numerical simulations offers a threshold to divide stable/ unstable landslides.
Moreover, Vallet et al. [172] used a velocity criterion method based on displacement velocity time-series data to distinguish acceleration crises (peaks) and periods of rest (troughs) to account for unstable and stable landslides, respectively. An SVM model was used to obtain the best coefficient of determination between the cumulative groundwater recharge and the landslide velocity and then to determine the rainfall threshold.

Prediction of landslide displacement
Forecasting landslide displacement is an important part of operational early warning systems. Landslide displacement time-series data can directly reflect landslide deformation and stability characteristics [99]. Therefore, it has been used to develop a landslide displacement prediction model [186]. Generally, these time-series data were collected from landslide monitoring systems (e.g., GPS, InSAR, or GNSS). In addition, the triggering factor is commonly used data, which includes precipitation and the groundwater level.
Under geological conditions such as geomorphology and geological structures, a variation in the displacement over time can be viewed as an approximately monotonically increasing function on a large time scale. Under external influencing factors such as the groundwater level and rainfall intensity, the variation can be correspondingly viewed as a periodic function on a small time scale. These available sequence data provide critical input data to predict landslide displacement.

Conventional machine learning methods for predicting landslide displacement
Recently, conventional machine learning methods, including the ANN [35], SVM [197], Gaussian process [106], and ELM [102], have been applied to produce models for landslide displacement prediction. Here, the input is the landslide displacement and the triggering factor. The output is the predicted landslide displacement. For example, Krkač et al. [88] presented an RF-based methodology for the prediction of landslide movements using the landslide velocity and displacements from precipitation data. The modeling of the groundwater level change rate from the daily and historical precipitation data was followed by modeling the landslide velocity from the predicted daily groundwater level depths (calculated from the groundwater level change rates). Ultimately, the trained model was used to predict the landslide velocity for nine periods (10 $ 90 days).
To optimize time-series data used as input, Li et al. [99] introduced a chaos theory-based Wavelet Analysis-Volterra filter model (chaotic WA-Volterra model) into SVM for cumulative landslide displacement prediction. The WA-Volterra model aims to decompose the cumulative displacement data into different low-and high-frequency components. Chaos theory was used to reconstruct the phase space of each frequency component. Reconstructed phase spaces were selected as the input-output data to train the SVM models. The predictive results (i.e., the predictive cumulative displacements) were obtained by summing the predictive displacements of each frequency component. This study indicates the potential for chaos characteristic identification of landslide displacements to be applied in machine learning. A certain optimization has been achieved in feature processing.

Deep learning methods for predicting landslide displacement
The aforementioned approaches regard landslide displacement prediction as a static regression problem. On the other hand, landslides are considered a dynamic system in which the displacement continues to change. The influencing factors and displacement conditions in one moment affect the displacement and stability conditions in the next moment. To investigate the dynamic process, LSTM is an appropriate method since it is suitable for learning the temporal dynamics of sequential data. The general workflow for the application of LSTM in landslide displacement prediction is as follows. The measured accumulated displacement of the landslide is first divided into a trend term (i.e., a static component) and a periodic term (i.e., a dynamic component). Selected controlling factors and periodic terms will be considered input. Generally, LSTM adds loops to the architecture, receives these inputs, and outputs a predicted result. Finally, the LSTM model was validated and estimated by comparing the predicted total displacement with the monitoring results of the total displacements.
For example, Yang et al. [186] first analyzed the relationship between landslide displacement and key influencing factors (i.e., rainfall and reservoir water level). Then, an LSTM model was produced for predicting the periodic term displacement. The LSTM model can establish connections between landslide conditions at different times and learn rules from previous deformation time steps. The results indicated that the LSTM model achieved a more satisfactory performance than static SVM methods.
Xie et al. [185] adopted an LSTM model to predict dynamic landslide displacement by evaluating the dynamic characteristics with the time domain. The prediction result indicated that the rainfall intensity and the excavation-induced stress redistribution affected the periodic displacement. Moreover, the measured and predicted deformation results showed good consistency.

Discussion
Predictive models developed by machine learning for landslides prevention can be underconstrained. For instance, models that perform well in datasets and are consequently viewed as high quality probably deviate strongly for situations and data outside their valid local areas because of the complex physical earth system. The challenges and opportunities in the applications of machine learning for landslides prevention will be discussed below.
6.1 Challenges from uncertain and complex datasets 6.1.1 Noise in datasets As mentioned above, selecting an appropriate machine learning model for landslides prevention depends largely on the input dataset. These datasets involve complex, nonlinear, physical earth systems that act across a vast range of spatial and temporal scales, predominately consisting of geological and meteorological environments. As one of the critical triggering factors, rainfall is strongly nonstationary under climate change. The related data may yield a great deal of noise. Additionally, sensor interference and instrument malfunctions could also induce noise/uncertainty. Sequences of (multisensor) satellite observations also have diverse noise sources, uncertainty levels, missing data, and (often systematic) gaps (e.g., acquisition, storage, and transmission distortions) [139]. In the case of remote sensors, atmospheric (clouds and other aerosols) and surface (snow and ice) interference are constantly encountered [79].
For instance, aerial and satellite images having a varied nature of reflectance (e.g., multispectral, hyperspectral and optical images, LiDAR and drone point clouds) can generally be used as input datasets. These datasets can contain more noise, coarser spatial resolution, more complex object geometry, and fewer samples with very large and irregular sizes.

Dataset heterogeneity
Climate and ecosystem processes reveal a high level of heterogeneity due to differences in geography, topography, and climatic conditions in diverse areas of the earth. For example, some regions are mountainous, some regions are dry and experience severe, long-term droughts, and some regions are quite wet and covered with dense forests. The patterns, mechanisms, and driving forces of landslides vary among these regions.
Currently, this heterogeneity in the data emphasizes the idea that landslides prevention models primarily apply to local or regional zones. Various factors correspond to a homogeneous group of locations. Developing a universal model that can be applied to global regions remains a challenge. Moreover, another source of heterogeneity is presented in different multi-sensors, which exhibit different imaging geometries, spatial and temporal resolutions, physical meanings, contents, and statistics.

Deluge of datasets
Another important issue needed to carefully consider is the data deluge. With technological development, airborne LiDAR surveys, SAR satellites, stereophotogrammetry, and mobile mapping systems are increasingly used and produce data volumes that raise computational challenges for spectral, spatial, and temporal dimensionalities.
Acquiring the effects of these multiple variables at fine spatial and temporal resolutions renders these data inherently high dimensional, where the number of dimensions can easily reach millions. For example, a dataset with a comparatively rough resolution can easily produce over 10,000 spatial grid points, where every grid point has multiple observations in time [78]. Furthermore, unlike classic computer vision applications that deal with photos with three channels (red, green, blue), hyperspectral satellite images extend to hundreds of spectral channels well beyond the visible range.
Overall, entirely data-driven approaches can present a challenge. There are several strategies for mitigating these problems.
The Persistent Scatterer InSAR (PS-InSAR) can select pixels with high coherence throughout the interferometric phase history. These pixels are less affected by spatial and temporal decorrelation noise and possess a highly stable phase history [90,177].
To a certain extent, the utilization of deep learning techniques can mitigate data-related problems [12]. Using simulations to generate supplemental synthetic training data can mitigate reliance on large datasets. This approach uses domain adaptation to adjust for divergence in the data distribution between real and synthetic data. Domain adaptation can select some novel architectures, such as mixed-reality GAN [198].
Since complex and dynamic physical processes govern landslides, conventional machine learning methods applied to landslides prevention still need to analyze scientific datasets with some expert knowledge. Indeed, deep learning models are important for mitigating potential data problems, but the predictive result may also be physically untenable. Providing strong theoretical constraints on top of the data-driven model is needed, which can be achieved by integrating professional knowledge.

Challenges from class imbalance
A common hindrance in machine learning for landslides prevention is the class imbalance problem. All machine learning-based methods need to extract certain features regarding landslide and non-landslide data samples for analysis and then find a classification boundary to divide the training areas into two classes (i.e., landslides and nonlandslides).
There are fewer areas in the training regions in which landslides appear than non-landslides. Such imbalances can cause a model to be biased towards classifying the susceptible areas as safe since there is a larger number of nonlandslide samples. After investigating the mapping of landslides with an imbalanced training sample (i.e., the sample contained more examples of non-landslide areas than landslide areas), a study indicated that the RF method underestimated landslide occurrences [161]. For overcoming this problem, some typical solutions can be divided into three categories: data-level techniques, algorithm-level methods, and hybrid approaches [87].

Data-level techniques
Data-level techniques mitigate imbalance problems through diverse sampling methods. One study selected a 1:1 sampling ratio of landslide data points to non-landslide data points by randomly selecting [6]. Beyond random oversampling, there are other sophisticated forms of resampling strategies, including the Synthetic Minority Oversampling Technique (SMOTE) [20] and Synthetic Minority Oversampling Technique-Iterative Partitioning Filter (SMOTE-IPF) [164]. Both approaches use the kNN algorithm to synthesize new instances. However, these strategies often result in drastically smaller datasets.
In addition, more strategies addressing class imbalance are needed for deep learning-based models. Random oversampling can improve the classification of imbalanced image data [152]. Random undersampling can decrease the amount of class imbalance for pretraining a deep CNN. Dissimilar to random undersampling, which completely removes potentially useful information from the training set, the two-phase learning strategy [94] can remove some non-landslide areas as the majority group during the pretraining phase. Moreover, this removal allows landslide areas, as the minority group, to contribute more to the gradient during pretraining and still allow the model to see all of the available data during the fine-tuning phase. Arguably, the two-phase learning procedure is critical for dealing with imbalanced distributions in image data [53].
Another strategy, called the dynamic sampling technique, oversamples the low-performing classes and undersamples the high-performing classes [132]. Currently, after introducing the class imbalance problem in the landslide prediction task, a study indicated that almost all instances, both synthetic and original, had been correctly classified by the RF classifier after preprocessing data using the minority oversampling technique with iterative partitioning filter (SMOTE-IPF) [3].

Algorithms-level techniques
Algorithm-level techniques to solve class imbalance problems are termed cost-sensitive learning approaches, which solve data imbalance problems by modifying the algorithms themselves by a class penalty or weight and distributes different cost values for the misclassification. For instance, the cost of misclassifying a landslide into non-landslide areas would be much higher than the opposite. The former may put human property and life at stake, while the latter just overprotects some areas identified as landslides. Commonly, the Mean Squared False Error (MSFE) can be selected to capture the errors equally.
To combat class imbalances in datasets for deep learning, Lin et al. [104] presented the Focal Loss (FL), which reshapes the Cross-Entropy (CE) loss to reduce the impact that easily classified samples have on the loss. The method can reduce the weight of non-landslide areas. Similar to the MSFE, the FL is relatively easy to integrate into existing models and hardly impacts the training time.
Gradually, some cost-sensitive deep learning methods are emerging, which learn network weight parameters and class misclassification costs during training and thus give higher importance to samples with a higher cost, such as landslide areas. Some methods include the Cost-Sensitive (CoSen) learning [83] and the Cost-Sensitive DBN with Differential Evolution (CSDBN-DE) [195].

Hybrid techniques
Hybrid systems strategically combine both sampling and algorithmic methods [76]. One strategy includes performing data sampling to reduce class noise and imbalance and then apply cost-sensitive learning or thresholding further to reduce the bias towards the majority group [87].

Challenges from lack of interpretability
Well-trained neural networks still have the typical issue of the lack of interpretability. In other words, it is difficult to explain the results from the input data by a machine learning model. Interpretability is a useful debugging tool for detecting bias in machine learning models. Machine learning models can only be debugged and audited when they can be interpreted [119]. For example, landslide susceptibility maps produced by machine learning need to earn the trust of government agencies, planners, decisionmakers, and local landowners and thus make them make emergency plans for reducing the negative effect brought by landslides.
Interpretability includes the visualization of the results for analysis by humans [139]. Obviously, relying on visualizations of neuron activations is not enough. Further research is needed to understand the patterns that models use to link to the training datasets.
Given their complexity, landslides prevention models in the earth system are often not easily traceable back to their assumptions, limiting their interpretability [139]. The analysis of inverse problems in the landslide application is an attempt to achieve interpretability [48]. Fusion approaches that integrate machine learning and landslide-related knowledge have attracted much attention in recent years. These methods might initially use deep learning methods for feature processing with subsequent reliance on handcrafted machine learning approaches for prediction, thereby leveraging domain knowledge to ensure interpretability [75,174].
Machine learning results will predominately be viewed as hypotheses and inspiration to further studies. Complementary machine learning-based predictive results will help geohazard management agencies build trust in these approaches and outputs.
Moreover, there is another noticeable issue. It is difficult to troubleshoot developed machine learning models because of unseen inputs.

Opportunities by using more machine learning methods
With the development of machine learning methods, some state-of-the-art techniques can provide more opportunities for landslides prevention by overcoming current limitations.

GAN
GAN is an unsupervised learning architecture for generative modeling using deep learning methods, such as CNN, which can automatically learn the regularities or patterns based on input data. These models can be viewed as a supervised learning problem with two submodels: the generator model and the discriminator model. The generator model aims to capture data distribution. The discriminator model aims to estimate the probability that a sample belongs to training data rather than being generated by the generative model. Iterative adversarial training is repeatedly completed between these models.
The power of abstraction in these models allows their higher levels to learn concepts and categories far more rapidly than their lower levels, owing to strong inductive biases and exposure to more data. GANs can substantially reduce the computational cost and mitigate overfitting [166].
Common analyses of landslides prevention (e.g., landslide susceptible assessment) primarily acquire data from remote sensing images. The tremendous volume of remote sensing images can make the process of labeling all the data prohibitively time consuming and expensive. GAN is an excellent opportunity since they can create training data by the generator.
Recently, some research has leveraged GAN models to learn the representation of remote sensing images using unlabeled data [55,103,110]. Lin et al. [103] designed a Multiple-layer Feature-Matching GAN (MARTA GAN) model. Remote sensing images with a resolution better than 256 Â 256 were produced by adding two deconvolutional layers in the generator. The classification accuracy of remote sensing images has been improved. The model can learn interpretable representations even from challenging remote sensing datasets.
Beyond improving the imaging quality of remote images, including the temporal and spatial resolution and a high signal-to-noise ratio, GAN models may be applicable in landslide detection. Some prior studies demonstrate that GAN has the potential to solve anomaly detection problems [147].

GNN
While deep learning effectively captures hidden patterns in Euclidean data, there is an increasing number of applications where the data are represented in the form of graphs [183]. GNN is becoming increasingly popular, which enjoys the major advantage of incorporating a sparse and discrete dependency structure between data points. A graph allows the representation of a multitude of associations through link orientation and data points [9].
GNN is gradually being applied in some domains, including scene graph generation, point cloud classification, and segmentation. This increase means an opportunity for analyzing unstructured spatial vector data and landslide detection/prediction [4].
GNN can be used to learn discriminative features from input graph-structured data. After the acquisition by LiDAR scans, datasets such as point clouds can be classified and segmented. GNN explores the topological structure based on the data [92,165].
As one of the predisposing factors for landslides, topographic related data can also be considered graph-and manifold-structured data, which can be set as a graph signal for the features of each point in the point cloud. From this perspective, GNN models will offer more outlooks for landslide prediction.
Previously, landslide events were assumed to be independent. That is, interdependence has almost no effect when referring to prevention. With modern networks growing exponentially in size, variety, and complexity, various emerging types of communication networks are more useful for landslides prevention, such as the Internet of Things, wireless sensors, and cloud-based networks [117]. Complex patterns are emerging, which bring out interdependence among diverse events [41]. Incorporating the effect of interdependence into landslides prevention in some regions might present an opportunity for applying GNN. For example, within each network, a landslide location as a node is connected, via weighted links, to a number of other landslide locations.

Automated approaches
Regardless of the model, it is necessary to select the proper parameters and thresholds of each feature. Automated approaches might simplify and even skip this process.
Deep learning methods applied in landslide detection can be fully automated to design stable and efficient networks that are stable and efficient using small training datasets. For example, CNN-based architectures are composed of a stack of layers or a stack of modules. Automated approaches can determine the optimal number of layers or the optimal number of modules required for a given application task, and to optimize the internal structure of a module [178].
Although the application of novel deep learning methods to landslides prevention research is still in its infancy, current applications in other domains already demonstrate that these methods have enormous potential for application in landslides prevention in the future.

Conclusions
A survey of machine learning algorithms applied in landslides prevention has been presented in this paper, which focuses on (1) landslides detection based on images, (2) landslides susceptibility assessment, and (3) the development of landslide warning systems. The survey shows that machine learning methods have been widely used in landslides prevention and can achieve satisfactory performance. However, there are still several challenges and limitations. First, professional knowledge is needed, which can facilitate the selection of more appropriate variables and datasets when facing increasingly complex and massive data. Second, interpretability is also a critical component in landslides prevention. The majority of established scientific theories on landslide occurrence mechanisms struggle to explain the results of machine learning models. Analyses involving machine learning results should be interpreted in combination with landslide mechanisms. Therefore, a potential research trend is to combine data-driven machine learning with expert knowledge of landslides. Gradually increasing the application of machine learning in landslides prevention will enable benefits for both the machine learning and landslide research domains.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.