1 Introduction

Landslides are regarded as complex geological phenomena, and they pose a significant risk to human life, property and living conditions (Froude and Petley 2018; Haque et al. 2019; Wu et al. 2022). High-precision landslide susceptibility maps (LSMs) are helpful for the rational allocation of land resources and the reduction of pertinent decision-making risks.

Nowadays, Landslide susceptibility prediction (LSP) is still a significant research topic around the worldwide (Xu et al. 2012). Numerous uncertainty issues, including study area scales (Thi Ngo et al. 2021), mapping units scales (such as grid units and slope units) (Huang et al. 2021a), the quantity of landslide samples (Huang et al. 2022), the quality of original data sources(Chen et al. 2020b; Schlögel et al. 2018), the choice and combination of environmental factors (Huang et al. 2021b) and the selection of LSP models (such as data-driven models) (Cavazzi et al. 2013), all have an effect on the LSP modeling process (Zhao et al. 2024). These uncertainty issues are viewed as errors in a broader sense and have an impact on our comprehension of LSP modeling.

Among the above uncertainty issues, the spatial patterns of landslides exhibit glaring differences and further transformed to the LSP results under different study area scales and mapping unit scales, although few attentions have been paid (Miller et al. 2015; Yu and Gao 2020). LSP modelling requires the preliminary selection of a suitable scale, such as county scale, watershed scale and/or large regional scale. The study scale also includes mapping unit scales that maximizes internal homogeneity and between-units heterogeneity (Calvello et al. 2013). Both scale characteristics can be found in remote sensing images, DEMs and other data sources used to describe topographic and geological characteristics, as well as land surface characteristics. Meanwhile, the combinations of environmental factors and LSP models have varying degrees of adaptability under various study areas scales and mapping unit scales. The reliability and accuracy of LSP results may be lowered, if the uncertainty issues related to both scales are poorly taken into account and the global and local study areas are generalized (Miller et al. 2015; Zhu et al. 2018a). This is because there will be no "adaptation" to local conditions for different scales (Huang et al. 2023b). Furthermore, few studies have explored how to jointly optimize various study area scales with landslide mapping unit scales and their various spatial resolutions. Grid units, for instance, are suitable for LSP under various spatial resolution conditions, while slope units, based on high-resolution Digital Elevation Model (DEM) data, are more suitable for both LSP and risk warning in a variety of study areas (Hodasová and Bednarik 2021).

To sum up, should LSP modeling be done at what study area scale and/or mapping unit scale? And do both scales affect how environmental factors are mapping for landslide evolution? These two problems suggest that joint consideration of study area scale and mapping unit scale is very crucial for LSP modeling. This study aims to investigate the effects of different study area scales and mapping unit scales on the uncertainty rules of LSP. The Ganzhou City, Ganzhou East, and Ruijin County in China were selected to represent different study area scales. Different mapping unit scales are represented by grid units with spatial resolution of 30 and 60 m, as well as slope units that were extracted by multi-scale segmentation method, so as to realize modeling uncertainty analysis under the combined working conditions of different scales.

2 Review of related studies

The identification of uncertainty issues is crucial to LSP modeling. This paper primarily examines these uncertainty issues, including the determination of study area scales and mapping unit scales, as well as the environmental factors selection.

2.1 Influence of environmental factors selection on LSP

In the past 30 years, countless environmental factors, such as slope, lithology, aspect, hydrology, river, curvature, et al., have been proposed based on mapping units in various geological and climatic settings (Yang et al. 2023). By using synonyms and environmental factors with similar descriptors but not necessarily identical meanings, Reichenbach et al. (2018) reclassified a large number of environmental factors into 23 categories. The 23 identified categories were then divided into five subject groups, including geology, hydrology, land cover, landform, and other variables. According to an analysis of the literature database, researchers prefer "simple" (direct) measures of landform, such as DEM, relief, slope, aspect, and curvature, and the majority of articles used environmental factors that were primarily related to landform (Gaidzik and Ramirez-Herrera 2021). We observe that slope has consistently shown to be the most useful environmental factor for LSP (Loche et al. 2022). From 5 to 22 environmental factors, with an average of 9 variables, were used for each individual SLP modeling, which may have been constrained by data sources and spatial analysis techniques. Most studies only include more than ten types of easily measurable environmental factors. For instance, Huang et al. (2020a) chose 13 different categories of environmental factors, including lithology, topographic wetness, slope, and aspect. Hong et al. (2018) chose 16 environmental factors for LSP, including lithology, topography, hydrology, and average annual rainfall. The accuracy of the LSP modeling will inevitably be impacted by the use of these few categories of environmental factors, which make it difficult to accurately reflect landslide characteristics (Huang et al. 2022).

Furthermore, it can be challenging to assess the significance of different environmental factors on LSP modeling in a study area (Huang et al. 2020b; Huang et al. 2023c). As a result, in order to accurately represent the development characteristics of landslides from various angles, it is necessary to gather a relatively rich category of environmental factors in the early stages of modeling (Youssef and Pourghasemi 2021)., The slope stability characteristics under heavy rainfall, engineering slope cutting, land cover, slope body structure, and soil mechanical properties can be taken into consideration based on the evolution characteristics of rainfall-type landslides in Ganzhou City of China. The analysis of annual rainfall, rainstorm frequency, topography, highway density, vegetation cover, soil clay/sand content, accumulation layer thickness, and rock weathering intensity can be focused on as the appropriate environmental factors (Li et al. 2021). Landform and lithology environment factors (DEM, slope, aspect, plan curvature, profile curvature, slope forms, topography relief, DEM variation, cutting depth, surface roughness, lithology and fault), hydrological and land cover environment factors, and other environmental factors are all categorized into 21 classes that correspond to four thematic clusters in Ganzhou City (terrain wetness indexes, modified normalized differential water index, drainage density, average rainfall, normalized differential vegetable index, normalized differential building index, road density and population density).

2.2 Review of influences of study area scales on LSP modelling

There is a lot of subjectivity and uncertainty because most studies do not take into account the spatial correlation of the study area with its surroundings (Shirzadi et al. 2019). In general, soil types are similar between regions with similar landforms, climates, lithologies, and other conditions. The corresponding LSP results ought to be similar even though these locations are discontinuous in space because they typically have comparable environmental factors or combinations of environmental factors. Geographical features are closer together in environments that are more similar to one another (Zhu et al. 2018b). Landslide disasters are a type of environmental geological problem with specific spatiotemporal properties, and they contain relative properties at various scales. They are natural phenomena that occur at specific times and locations (Shou and Lin 2016; Zhu et al. 2015a). Additionally, because it is a regional natural condition, its scale has an impact on how environmental factors are characterized and evolve (Fressard et al. 2014; Kuan-Tsung et al. 2019; Zhu et al. 2015b). The dominant environmental factors affecting landslide evolution are different as a result of different geographical environments, climatic conditions, and human activities at various scales within a study area, which causes different spatial distribution of landslides (Miller et al. 2015).

The weight of spatially associated environmental factors and different sample quantities used for LSP model building can both be directly impacted by study areas with different scales. The first and most important step in LSP is the selection of environmental factors in the study area (Shi et al. 2018). In study areas with various scales, these landslide-related environmental factors, particularly terrain and hydrological factors, have a significant impact on LSP (Du et al. 2020; Zhu et al. 2018b). The actual characteristics of environmental factors are weakened or confused by a large study area (Yu and Gao 2020). Environmental factors can reflect changes in landslides in micro-regional environments in small study area (Kang et al. 2016; Zhu et al. 2018a). Additionally, the primary environmental factors vary by study areas and scale.

2.3 Review of influences of mapping unit scales on LSP modelling

A mapping unit is a section of the land surface that is distinguished from neighboring units by a unique set of ground conditions (Reichenbach et al. 2018). A mapping unit of landslides and associated environmental factors for LSP can be thought of as an observation window at the scale of the analysis, though it is difficult to choose the best observation window (Loche et al. 2022). All of the commonly employed mapping units for LSP that have been suggested in the literature can be categorized into one of the following groups: I grid units, II slope units, III small watershed units, IV topographic units, V political or administrative units (Alvioli et al. 2016; Drǎguţ et al. 2010). Grid units and slope units are thought to be the two primary types of object units used for LSP, according to related literature (Huang et al. 2021a).

The most common mapping unit was by far grid units, which were distinguished by their constant shape and regular grid (Li et al. 2021). At the same time, we found that most LSP models used grid units with the same resolution for environmental and landslide factors, while very few used different resolutions for the mapping unit and the DEM (Gaidzik and Ramirez-Herrera 2021). The microtopographic morphometric signature can be captured in great detail by using small grid units (5 m × 5 m or less), but this may have little to or no geomorphological, geological, or geomorphic climatic significance for large study scale areas, where a coarser resolution DEM would be more effective (Salciarini et al. 2016). Furthermore, the size of the spatial sampling unit, which directly affects the number of samples required to build the model and the overall number of grids in the study area, is the essence of spatial resolution (Guo et al. 2022).

Slope units have received a lot of attention in the last 15 years because they accurately reflect the physical relationships between landslides and morphological features, but the extraction of slope units has always baffled cartographers (Wang et al. 2018). Meanwhile, the slope unit is derived from the grid units, and it should meet the following three conditions: (1) continuous closed geographical space; (2) uniform slope and aspect; (3) no jump in surface slope. Based on the above requirements, an object-oriented multi-scale image segmentation method is used to extract the slope units in the study area (Liu et al. 2023). This method is applicable in various study areas with different landforms, and is more efficient and accurate comparing to other slope unit extraction methods (Chang et al. 2023; Ma et al. 2023). The image pixel is transformed into the slope through multi-scale segmentation, which collects the pixels with the same spatial and spectral characteristics into a "homogeneous and uniform" unit (Lim and Keles 2019). The size of the spatial sampling unit, which determines how many samples are needed to build the model and how many grids are present in the study area, is the key to understanding spatial resolution. The object units for LSP model building in this study are slope units obtained by multi-scale segmentation method and grid units with spatial resolution of 30 m and 60 m.

3 Summary

To determine LSP, a variety of methods have been put forth (Reichstein et al. 2019). Despite the variations, the reproducibility issue of "to what extent this conclusion can be applied to other regions" (Liu et al. 2022) affects all methods. The majority of geo-disaster studies lack a natural analytical scale, and the rules discovered in a study region depend on how the mapping units are divided, leading to variable mapping unit issues. The mapping unit issue is more apparent, particularly when analysis based on spatial extensibility environmental factors. The model is also impacted by the ambiguous geographic study area scales. Finding universal laws is made more difficult by the fact that various patterns can appear within the same study area (Liu 2022). The choice of the study area scale and the division of the mapping unit scale determine which rule is more accurate. The overall goal of this study is to conduct more in-depth analyses of LSP at various study area and mapping unit scales.

4 Methodologies

This study makes extensive use of multi-source remote sensing technology (Lissak et al. 2020), geographic information systems (Chen and Chen 2021), and data-driven models (Chen et al. 2020a) to construct LSP modelling in order to discuss the effects of study with various scales on LSP modeling. Three distinct study are chosen: Ganzhou City in China, Ganzhou City's eastern region (Ganzhou East), and Ruijin County in Ganzhou East's a region. To represent various mapping unit scales, grid units with 30 m and 60 m spatial resolutions, as well as slope units extracted using the multi-scale segmentation method, are used. In order to create landslide and non-landslide spatial datasets, the Ganzhou City landslide inventory data is first determined, and 21 landslide environmental factors are extracted.

The above spatial datasets are then trained and tested using SVM and RF models in order to obtain LSMs for Ganzhou City. In order to address LSP in Ganzhou East and Ruijin County, respectively, landslide information for Ganzhou East and Ruijin County is also extracted from Ganzhou City, followed by a repetition of the aforementioned modeling process in Ganzhou City. The LSMs of Ganzhou East and Ruijin County are then extracted using masks from those of Ganzhou City, and the LSMs of Ruijin County are extracted using masks from that of Ganzhou East. In order to analyze the uncertainty rules of LSP modeling under study areas with different scales based on LSMs predicted and masked through the aforementioned 9 types of conditions, accuracy and distribution rule of LSIs are adopted. The specific flow chart used in this study is shown in Fig. 1.

Fig. 1
figure 1

Flow chart of LSP under different scales of study areas (AUC is the area under the receiver operating characteristic curve, Terrain wetness indexes, modified normalized differential water index, normalized differential vegetable index, normalized differential building index)

4.1 Multi‑scale segmentation method

In order to study regional LSP at various study area scales, grid units and slope units have chosen as mapping units (Alvioli et al. 2020). The ArcGIS 10.1 platform makes it relatively easy for grid units to implement LSP at various study area scales. The multi-scale segmentation method's extracted slope units are not a single pixel, but rather a homogeneous object with spectral features, spatial features, and shape features (Alvioli et al. 2022). Through iterative updating, adjacent pixels with similar spectral and shape features are combined into slope units with the same homogeneous properties based on the top-down region-growing segmentation algorithm of pixels (Fig. 2). The weighted sum of spectral and shape heterogeneity yields the heterogeneity index f (Eq. (1)) between slope units (Huang et al. 2021a).

$$f = w_{\text{color}} h_{\text{color}} + (1 - w_{\text{color}} )h_{\text{shape}}$$
(1)

where wcolor is the spectral weight, hcolor represents the object spectral heterogeneity, which is determined by the scale and spectral standard deviation of the object, and hshape represents the object shape heterogeneity, which is determined by the weighted sum of compactness and smoothness.

Fig. 2
figure 2

Workflow of slope units extracted by multi-scale segmentation method

4.2 Machine learning models

The LSP models are also mainly divided into deterministic models, knowledge-driven models and machine learning models (Du et al. 2020; Huang et al. 2020a; Zheng et al. 2019). The deterministic model’s reliance on many complex soil mechanical parameters and the knowledge-driven model’s reliance on human subjectivity can be somewhat reduced by the data-driven model, as well as the LSP accuracy can be improved when there are gaps in the data or the data is of poor quality (Juliev et al. 2019; Pham et al. 2022; Zheng et al. 2023). To determine characteristics and discriminant rules of landslide occurrence, data-driven models primarily use the internal relationships between known landslides and different environmental factors (Hu et al. 2020; Sun et al. 2020). At present, remote sensing and machine learning models have been widely used in the regional LSP and a series of research results have been achieved (Pradhan et al. 2023). Logistic regression, artificial neural networks, support vector machine (SVM), boosting algorithm, decision tree, random forest (RF), and deep learning models are typical examples of machine learning models (Huang et al. 2020a; Lombardo and Mai 2018; Lombardo and Tanyas 2020; Xiao and Zhang 2023; Zhao et al. 2019). Merghadi et al. (2020) have reviewed and compared various machine learning algorithms for landslide susceptibility studies. Among them, SVM and RF models are the most widely used, have the best prediction accuracy, and are not sensitive to the multicollinearity of environmental factors.

4.2.1 Support vector machine

SVMs are conducted to find the best hyperplane to use for modeling and to use support vectors on the hyperplane to maximize the space between classes. By converting the input variables into an n-dimensional eigenspace using a kernel function, nonlinear data can be linearly separable. mi represents each environmental factor for a set of linearly separable training vectors \(m_{i} \left( {i = 1,2, \ldots ,n} \right)\), with the corresponding output categories \(y_{i} = \pm 1\) being landslides and non-landslides, respectively. In order to categorize landslides, the maximum distance through n-dimensional hyperplanes is determined. Between them, \(\frac{1}{2}\left\| s \right\|^{2}\) is the widest spacing. Additionally, it uses relaxation variables ξi to control classification error for data with linear inseparability, and the corresponding constraint condition is \(y_{i} ((s \cdot m_{i} ) + b) \ge 1 - \xi_{i}\). Furthermore, by introducing \(\nu (0,1)\), incorrect classification is taken into account. Equation (2) displays the hyperplane distance, where λi is the lagrange multiplier, b is the constant, ||s|| is the norm of the normal hyperplane. The linear, polynomial, radial basis, and Sigmoid functions are typically the main components of the SVM kernel function. This study (Xu et al. 2012) uses the radial basis kernel function, which has a good application in LSP.

$$L = \frac{1}{2}\left\| s \right\|^{2} - \frac{1}{\nu n}\sum\limits_{i = 1}^{n} {\xi_{i} }$$
(2)

4.2.2 Random forest

To diversify the generated classification trees, the RF is to build various training datasets extracted by putting back and to pick various features at random (Youssef et al. 2016). A collection of various classification trees can more fully reflect factual findings than a single tree, improve model predictability, and prevent over-fitting. In addition, RF uses out-of-bag error to achieve unbiased generalization error estimation, and it gradually converges as the number of trees grows. The out-of-bag error can also serve as a proxy for the significance of each factor variable. When only one variable is changing in out-of-bag data, the importance of the factor variable is determined by the size of the out-of-bag error change. Mean decrease accuracy and mean decrease Gini are also used to gauge the significance of the input variable. Additionally, the number of trees and features in the model is a key factor in determining how well it predicts (Huang et al. 2022).

4.3 LSP results assessment

AUC accuracy evaluation and LSI distribution rule are the two factors that most clearly illustrate the uncertainties in LSP modeling. AUC value is a metric used to quantitatively assess the overall effectiveness of LSPs (Garosi et al. 2019). The receiver operation characteristic is the following: first, LSIs are calculated, and various samples in the test datasets are ranked; second, different cut-off points are chosen in this order; third, the samples are used one at a time as positive samples to predict; and finally, the true positive rate and false positive rate calculated in the current predictor each time are taken into consideration as vertical and horizontal axes in the receiver operation characteristic curve. Better LSP performance is suggested by a higher AUC. According to Eq. (3), the AUC value represents the likelihood that randomly selected positive samples will rank higher than randomly selected negative samples.

$$\text{AUC} = \frac{{\sum\nolimits_{i = 1}^{{n_{0} }} {r_{i} - n_{0} \times {{\left( {n_{0} + 1} \right)} \mathord{\left/ {\vphantom {{\left( {n_{0} + 1} \right)} 2}} \right. \kern-0pt} 2}} }}{{n_{0} \times n_{1} }}$$
(3)

where n0 represents the number of negative samples, n1 represents the number of positive samples, ri represents the order of the ith negative sample in entire test samples. The mean value and standard deviation, on the other hand, primarily reflect the different distribution rules of LSIs.

Additionally, the mean value depicts the LSIs' average level of distribution, and the standard deviation shows how widely distributed they are. The distribution rule of LSIs as a whole is analyzed using the mean value and standard deviation, which offers theoretical direction for LSP in the research area (Huang et al. 2020a).

5 Study area and materials

5.1 Study area

Ganzhou City is situated where the Jiangnan Hills and Lingnan Mountains meet (Figs. 3b, c). Cities like Ningdu, Shicheng, Ruijin, and Huichang are located in Ganzhou East (Fig. 3d). In the region, metamorphic rocks, such as Triassic strata, Cretaceous shale, siltstone, Cambrian strata, Sinian phyllite, and slate, as well as secondary intrusive magmatic rocks from each era, make up the majority of the exposed stratigraphic layers. There have been 2496 geological disasters overall, with a 17.59/100 km2 average surface density. Eastern Ganzhou City's Ruijin County is situated at northern latitudes of 25°30' and 26°20' and eastern longitudes of 115°42' to 116°22'. Figures 3c, e show that rather than being next to one another, the study areas span from Ganzhou City to Ruijin County.

Fig. 3
figure 3

Landslide location in the study area

5.2 Landslide inventory information

Numerous environmental factors can cause landslides, and each environmental factor has a different weight and effect on the likelihood of a landslide (Abraham et al. 2020; Shi et al. 2018). The performance of the LSP in a study area is determined by the quality of the landslide data. Understanding the locations, movement types, triggering times, scales, and related evolution of geological environments of landslides from space is aided by information from landslide inventories (Franceschini et al. 2022; Segoni et al. 2018). Table 1 displays a series of data used for LSP.

Table 1 Data lists used for LSP

The information is primarily derived from field surveys and high-resolution image interpretation of local geological disasters at 1:100000 scale. According to landslide inventory in Ganzhou City, there were 9555 disasters, including 3855 soil landslides with landslide density of 78/100 km2; by the end of 2014, there had also been a significant number of collapses, debris flows, and ground collapses in the study areas (Hungr et al. 2014). In Ganzhou East, there were 2041 disasters, of which 1519 were landslides, with a landslide density of 14.07/100 km2, and they primarily occurred in the southern and eastern Ningdu City, the northern and northeastern Shicheng City, the northwestern Ruijin County, and the southern Huichang City; Additionally, Ruijin County had 414 disasters, of which 370 were small- and medium-sized landslides, with a landslide density of 15.18/100 km2, which occupies 89% of the total disaster amount and is mainly distributed in the northern mountainous area (Figs. 3c–e).

5.3 Landslide related environmental factors

There is no set standard for the selection of environmental factors plays, despite the fact that it has a huge impact on the quality of an LSM during the LSP process (Kang et al. 2016). Real-time rainfall (Fustos et al. 2020) and earthquakes are typically the main outside influence on initial landslides. The slope and basic environmental factors nearby also have an impact on the stability of landslides, particularly in small areas with similar external factors. Additionally, nearby basic environmental elements like topography, geology, hydrology, and vegetation conditions heavily influence landslide stability. Additionally, this study chose the 21 landslide environmental factors listed in Table 2 based on previous LSP studies (Chen and Li 2020) and an analysis of environmental traits in study areas.

Table 2 Environmental factor types

5.3.1 Topographic environmental factors

The influence of geography on landslide evolution can be reflected in terrain factors. 10 different types of terrain factors are obtained in this study using DEM (Figs. 4a–j). Climate can be influenced by elevation (Fig. 4a), can low- and middle-altitude regions are frequently conducive to the formation of landslides, which are frequently cited as environmental factors (Shahabi et al. 2014). Due to its direct impact on the shear stress that leads to landslide instability and failure, slope is an significant factor in promoting the occurrence of landslides (Hong et al. 2017a). According to statistics, the majority of the slopes in Ganzhou City where landslides have occurred are between 15° and 45° (Fig. 4b). However, there are some variations in the slope-generating landslides in various locations with various lithologies. For instance, landslides typically occur on slopes between 15° and 25° in hilly areas, whereas they typically happen on slopes between 30° and 45°, in middle and low mountainous areas. Figure 4c shows the variation in soil moisture content and the distribution of vegetation cover in all directions (Hong et al. 2017b). The plan curvature and profile curvature in Figs. 4d, e show, respectively, how the topographic gradient affects flow velocity and convergence (Chen et al. 2017). The geometries of various slopes are known as slope forms (Fig. 4f). Both the elevation variation coefficient (Fig. 4h) and the topographic relief (Fig. 4g) are macroscopic indices that reflect the degree of surface relief and fragmentation. For the purpose of analyzing soil loss and the development of surface erosion, surface roughness (Fig. 4i) and surface cutting depth (Fig. 4j) are crucial reference indices.

Fig. 4
figure 4

Landform and lithology environment factor maps in Ganzhou City a DEM b Slope c Aspect d Plan curvature e Profile curvature f Slope forms g Topography relief h DEM variation i Cutting depth j Surface roughness k Lithology l Fault

5.3.2 Geology factors

The permeability, matric suction, and shear strength of rock and soil are all different, which is reflected in the difference in lithological and physical properties (Hong et al. 2017b). The primary lithology types in Ganzhou City are clastic and carbonate rocks, which are sporadically distributed in counties, metamorphic rocks which are primarily found in the western and southeastern regions of the study area, and metamorphic rocks which are widely distributed near rivers and other bodies of water. Additionally, geological maps of Ganzhou City at a scale of 1:100000 are used to generate the lithologies in various subareas (Fig. 4k). The metamorphic strata with a high incidence of geological disasters, which account for 1623 landslides 42% of the total number of landslides, are followed by the clastic, magmatic, and carbonate strata.

In the geological structure (Fig. 4l) also frequently has a negative effect on landslide stability, and tectonic movement is accompanied by differential lifting of faults and folds, which frequently creates a structural weak zone and lowers, the stability of rock and soil mass. The deep Anyuan—Yingtan, Xunwu—Ruijin, Dayu—Nancheng and Quannan—Anyuan Faults are the main faults in Ganzhou City. The effect of geology on the evolution of landslides is represented in this study by lithology and the distance to faults.

5.3.3 Hydrological environmental factors

Topographic and remote sensing hydrological factors can be used to categorize hydrological environmental factors. The soil moisture content and groundwater distribution are shown by the terrain wetness indexes (Fig. 5a) (Xu et al. 2012). In the visible and near-infrared bands, there are clear distinctions between the spectral reflectance of water and vegetation. Similar to the modified normalized differential water index (Fig. 5b), which is primarily used to identify surface water, the terrain wetness indexes can effectively extract comprehensive water information, such as surface runoff and groundwater. Drainage density (Fig. 5c) is used to represent the impact of rivers on landslide occurrences because rivers' scour and erosion have significant detrimental effects on landslide stability.

Fig. 5
figure 5

Hydrological and land cover environment factor maps in Ganzhou City a TWI b MNDWI c Drainage density d Average rainfall e NDVI f NDBI g Road density h Population density

Total radiation, which has a significant impact on surface vegetation, soil humidity, and atmospheric temperature is the sum of direct solar radiation and sky radiation received from horizontal surfaces and the main energy source of atmospheric circulation and water circulation (Huang et al. 2020c). Specifically, total radiation is the main energy source for plants to carry out photosynthesis, and its intensity and duration will affect the evaporation and loss of soil water, and also change the temperature by affecting the surface energy balance. One of the primary factors causing landslide evolution in Ganzhou City is average annual rainfall (Fig. 5d). The weight of the landslide mass is greatly increased by rainfall infiltration, creating a weak zone. Additionally, as the landslide mass's pore water pressure rises, the effective stress decreases, which, in turn, lowers friction resistance in the sliding zone (Bai et al. 2020). In addition, factors affecting precipitation typically include total precipitation, rainfall days, and daily precipitation of more than 50 mm. The easiest way to quantify the impact of rainfall on landslide evolution is to use the annual average rainfall (Abraham et al. 2020).

5.3.4 Land cover factors

In addition to improving geotechnical physical properties through root action, vegetation also successfully lowers the erosion and infiltration effect of rainfall on landslides, affecting the stability of the landslide. The density and distribution of surface vegetation can be represented by the normalized differential vegetable index (Fig. 5e), which also prevents the evolution of landslides. In order to represent the distribution of residential building land and comprehend the dense local residential area, the normalized differential building index (Fig. 5f) shows relatively common building information that has been extracted through remote sensing image data (Chang et al. 2020). Path/row 121/42 Landsat 8 images with a 30 m resolution taken on October 3, 2013 are used to calculate the normalized differential vegetable index and the normalized differential building index. The study area's population density can be measured using the population density (Fig. 5f), which is defined as the number of people per unit area. Landslides are negatively impacted by all phases of road construction and operation, and during the rainy season, artificially created high and steep slopes, excavation, blasting, and loading are common causes of landslides. To represent the impact of road and traffic facilities on landslide occurrences, the road density (Fig. 5g) is used.

5.4 Establishment of LSP modelling spatial dataset

5.4.1 Establishment of grid-unit dataset

As the mapping units for LSP in this study, related data sources with various resolutions are resampled to grid units with 30 m and 60 m resolution. This research area in Ganzhou City spans a vast 10,794.86 km2 areas. Grid units of 45,525,924 (10,942,367), 9,310,512 (3,011,683), and 2,715,630 (687,633) are used to divide the study areas of Ganzhou City, Ganzhou East, and Ruijin County under 30 m (60 m) raster resolution, respectively. Ganzhou City saw a total of 3855 landslides, which were divided into 53,078 (13,371), 17,115 (4499), and 5482 (1396) landslide grid units, respectively. Of these, 1519 landslides happened in Ganzhou East and 366 landslides happened in Ruijin County. In addition, a random number generator chooses the same number of non-landslide grids as corresponding landslide grids in the study areas. By randomly selecting non-landslides in non-landslide area, we can ensure that non-landslides are not concentered (Chang et al. 2023). Landslides and non-landslides are output variables with assigned values of 1 and 0, which build training datasets and test datasets with a random partition of 7:3. The original 21 environmental factors are extracted as model input variables. To calculate LSIs, the trained model's original grid unit values from the entire study area are substituted in, and the model is then divided into 5 classes using the natural break point method (Li et al. 2019).

The input datasets of landslides used for modelling are shown in Table 3, and all polygonal landslide surfaces are converted into point data in ArcGIS 10.1 software. Through resampling and correlation analysis of DEM data with 30 m resolution and other relevant environmental factors, the environmental factors in various study areas are discovered. Figures 4 and 5 display the environmental factor maps of Ganzhou City.

Table 3 Training datasets and test datasets of landslides in the study

5.4.2 Establishment of slope-unit datasets

In this study, slope units are extracted and a dataset is established using the multi-scale segmentation method. The software eCognition 8.7 performs object segmentation on the shaded relief and terrain aspect images using a multi-resolution segmentation algorithm. Scale parameters, shape weight, and compactness weight should be set when performing multi-scale segmentation. The landslide morphology and scale characteristics are combined with the literature's trial-and-error method (Huang et al. 2020b), and the parameters of scale, shape, and compactness are set as 20, 0.8, and 0.8, respectively, to extract the ideal slope unit.

Based on the multi-scale segmentation method, the study areas, including Ganzhou City, Ganzhou East, and Ruijin County, are divided into slope units of 296,003, 78,565, and 17,989, respectively. As positive sample data for slope unit research, Ganzhou City experienced a total of 3855 landslides, of which 366 occurred in Ruijin County and 1519 in Ganzhou East. The results are displayed in Table 3 and are consistent with grid units' method for the selection of non-landslide sample data and the division of training and test datasets.

6 Results of LSP under different scales of study area and mapping unit

This study addresses RF modelling of spatial datasets at various study area scales using the random forest package of the R programming language (Youssef and Pourghasemi 2021). The number of the factor features and trees, which can be obtained by automatic parameter screening and out-of-bag errors for the best parameters (Hong et al. 2019), is what primarily controls the RF's accuracy. The finding indicate that the number of factors features of 5 and the number of trees of 800 are the RF model's ideal parameters in Ganzhou City. Only a representative set of the parameters of other working conditions are presented because they do not significantly differ from those of this group. Then, using these parameters at various study area scales, LSIs for slope units and grid units with 30 m and 60 m resolution are predicted. By using the natural break point method, we finally categorize them into five classes of susceptibility zones (Li et al. 2021).

6.1 Landslide susceptibility results under different study area scales

To represent the various study area scales, Ganzhou City, Ganzhou East (the eastern portion of Ganzhou City), and Ruijin County in Ganzhou East were chosen. Along with that, LSMs from Ganzhou East and Ruijin County are also extracted by mask from that of Ganzhou City, as are LSMs from Ruijin from Ganzhou East. The rule of the aforementioned six different conditions is used to analyze the results of the study area scales used to determine the susceptibility to landslides. As an illustration, consider the results of RF models for land-slide susceptibility based on slope units and grid units with a 30 m resolution. Figures 6 and 7 demonstrate that (1) As the size of the study area gradually shrinks, so do the areas of the low- and very low-susceptibility zones. (2) In addition, there are notable differences between the LSMs of Ruijin County extracted from the mask and those that were predicted. (3) Furthermore, the high susceptibility zones are primarily found in low and medium altitude regions below 400 m, have slopes of 8° –20°, have topographic relief of 3–12, and have a profile curvature of 3–18, according to overlay analysis between environmental factor maps and LSMs obtained by the RF model.

Fig. 6
figure 6

LSMs under different study area scales a Ganzhou City of slope units and RF model b Ganzhou East extracted from Ganzhou City c Ganzhou East of slope units and RF model d Ruijin extracted from Ganzhou City e Ruijin extracted from Ganzhou East f Ruijin County of slope units and RF model

Fig. 7
figure 7

LSM under different study area scales a Ganzhou City of 30 m grid units and RF model b Ganzhou East extracted from Ganzhou City c Ganzhou East of 30 m grid units and RF model d Ruijin extracted from Ganzhou City e Ruijin extracted from Ganzhou East f Ruijin County of 30 m grid units and RF model

6.2 Landslide susceptibility results under different mapping unit scales

The different mapping unit scales are represented by the grid units with 30 m and 60 m resolutions as well as the slope units. The rules of the aforementioned various conditions are adopted to analyze the landslide susceptibility results under various mapping unit scales, using the results for Ruijin County under RF models, Ruijin County extracted from Ganzhou East, and Ruijin County extracted from Ganzhou City as examples. Figure 8 demonstrates that (1) the areas of the low- and very low-susceptibility zones grow as the resolution of the grid units gradually declines. (2) In the same study area, very low to moderate susceptibility zones based on slope units are smaller than those based on grid units. (3) In addition, the slope units and grid units in the LSMs of Ruijin County differ significantly.

Fig. 8
figure 8

LSMs of Ruijin County based on different mapping units under RF ac Ruijin extracted from Ganzhou City, Ruijin extracted from Ganzhou East, and Ruijin County based on 30 m Grid units df Ruijin extracted from Ganzhou City, Ruijin extracted from Ganzhou East, and Ruijin County based on 60 m Grid units gi Ruijin extracted from Ganzhou City, Ruijin extracted from Ganzhou East, and Ruijin County based on Slope units

6.3 Uncertainties of LSP results under different scales

Accuracy and LSI distribution rule are used to express LSP uncertainties. The mean value of LSIs declines and the standard deviation of LSIs increases along with an increase in accuracy and efficiency, while the level of uncertainty in LSPs decreases. The key to assessing the impact of uncertainties on LSP is the availability of LSIs with high accuracy, low mean, and high standard deviation.

LSP's success depends on the evaluation of the modeling quality. The corresponding AUC values are used, and Table 4 displays the AUC values for slope and grid units (30 m/60 m) based on LSP modeling in study areas of various scales. With the reduction in study area scale from Ganzhou to Ruijin County, there is a tendency for the LSP accuracy of slope and grid (30 m/60 m) units in various study area scales to increase. For instance, the RF model's AUC values for slope units are 0.936 in Ruijin County, 0.928 in Ganzhou East, and 0.947 in Ganzhou City, respectively. Grid units (30 m) in the RF model have AUC values of 0.918. 0 in Ruijin County, 0.888 in Ganzhou East, and 0.915 in Ganzhou City. Grid units (60 m) in the RFmodel have the following AUC values: 0.884 in Ruijin County, 0.856 in Ganzhou East, and 0.848 in Ganzhou City. Additionally, the results for the LSP accuracy of Ruijin extracted from Ganzhou City, Ruijin extracted from Ganzhou East, and Ruijin County based on the mapping units under RF model are all the same.

Table 4 AUC accuracies of slope (grid) units in different study area scales based on RF (SVM) model

The distribution rule of LSIs in the RF model is reflected in study areas with various scales and units using the mean value and standard deviation (Li et al. 2020). The mean value is used to measure the overall bias of LSIs, and the standard deviation shows how widely distributed LSIs are. In study areas with various scales, Table 5 displays the average values and standard deviation of slope and grid units (30 m/60 m) based on LSP modeling.

Table 5 Mean and standard deviation values of mapping units in different study area scales based on RF and SVM

6.3.1 Accuracy assessment under different study area scales

The accuracy of LSIs predicted by the RF in study areas with different scales has an increasing tendency as the scale of the area decreases from Ganzhou to Ruijin County (Fig. 9). According to Figs. 9c and e, the AUC values for grid units (30 m/60 m) in the RF model are 0.918 (0.884) in Ruijin County, 0.888 (0.856) in Ganzhou East, and 0.915 (0.848) in Ganzhou City. The AUC values in Ganzhou East and City are 0.888 (0.860) and 0.856 (0.831), respectively. There is a tendency for the LSP accuracy to decrease between Ganzhou East and that extracted by masking from Ganzhou City. Ruijin County has the highest AUC accuracy, followed by Ruijin extracted from Ganzhou East and Ruijin extracted from Ganzhou City (Figs. 9b, d and f). The LSP accuracy of Ruijin County and those extracted by masking from Ganzhou East and Ganzhou City are significantly different.

Fig. 9
figure 9

AUC values of different study area scales based on slope units and grid units with 30/60 m resolution under RF model ab Slope units cd Grid units with 30 m resolution ef Grid units with 60 m resolution

6.3.2 Accuracy assessment under different mapping unit scales

The effects of slope units and grid units with 30 m and 60 m resolution on the accuracy of LSP modeling are discussed. These are the findings:

  1. (1)

    The LSP accuracies of slope units extracted by Multi‑scale Segmentation method are higher than those of grid units, and the efficiency of LSIs has significantly increased, along with a reduction in the scale of the region from Ganzhou to Ruijin County. The AUC values for the slope-based RF model (grid units with a 30 m resolution) are 0.936 (0.918) in Ruijin County, 0.928 (0.888) in Ganzhou East, and 0.947 (0.915) in Ganzhou City, respectively (Fig. 10). The LSP accuracy of slope units based on RF model is higher than that of grid units in Ruijin County and that extracted by masking from Ganzhou East and Ganzhou City (Fig. 10), and the LSP accuracy of slope units in Ganzhou East and that extracted by masking from Ganzhou City is also better (Fig. 11).

  2. (2)

    There are significant differences between LSP accuracy of grid units with 30 m and 60 m resolutions in Ruijin County and that extracted by masking from Ganzhou East and Ganzhou City. Grid units (30 m/60 m) in the RF model have AUC values that are 0.918 (0.884) in Ruijin County, 0.888 (0.856) in Ganzhou East, and 0.915 (0.848) in Ganzhou City (Fig. 11). The AUC values in Ganzhou East and Ganzhou City are 0.888 (0.860) and 0.856 (0.831), respectively. There is a tendency for the LSP accuracy to decrease between Ganzhou East and that extracted by masking from Ganzhou City (Fig. 11). The LSP accuracy tends to decline in Ruijin County and that which was extracted by masking from Ganzhou East and Ganzhou City.

Fig. 10
figure 10

AUC values of different mapping units in different study area scales based on RF model a Ganzhou City b Ganzhou East c Ruijin County

Fig. 11
figure 11

AUC values of different mapping units in Ganzhou East and Ruijin County based on RF model a, d Slope units b, e Grid units with 30 m resolution c, f Grid units with 60 m resolution

In conclusion, the LSP modeling shows that the LSP accuracy significantly rises as the size of the study areas decreases from Ganzhou City to Ruijin County. The LSP accuracy in Ruijin County that was extracted by masking from Ganzhou City is, in turn, less accurate than that in Ganzhou East and that in Ruijin County that was directly predicted. Furthermore, Ganzhou East's LSP accuracy, which was obtained by masking Ganzhou City, is lower than it is for Ganzhou East with direct prediction, indicating LSP accuracies that are comparable to those of Ruijin County. The RF and SVM models in Ruijin County have the highest LSP accuracy, with the LSP accuracy in study areas with different scales tending to increase as study area scales are reduced. In addition, slope units extracted using the multi-scale segmentation method have higher LSP accuracy than grid units with a resolution of 30 m.

6.3.3 Distribution of LSIs under different study area scales

In comparison to those extracted by masking from Ganzhou City and Ganzhou East, the distribution rule of the LSIs directly predicted in Ruijin County is significantly different. Slope units, for instance, have mean values of 0.243 in Ruijin County, extracted from Ganzhou City, 0.240 in Ruijin County, extracted from Ganzhou East, and 0.268 in Ruijin County, directly predicted by Ruijin County (Figs. 12 and 13). Additionally, the standard deviation illustrates how, under the same circumstances, the LSI distribution differs from the mean value. The distribution of LSIs obtained by direct prediction in Ruijin County is more consistent with the actual distribution of landslide susceptibility, and there are significant differences between the extracted and directly predicted LSIs there. In contrast to that extracted by masking from Ganzhou City, which is consistent overall, the distribution rule of LSIs in Ganzhou East with direct prediction is more consistent with the actual distribution of landslide susceptibility.

Fig. 12
figure 12

The distribution of LSIs with RF (SVM) model in Ganzhou City

Fig. 13
figure 13

The distribution of LSIs with RF (SVM) model in Ruijin County in study areas with different scales af Slope units gl Grid units with 30 m resolution

6.3.4 Distribution of LSIs under different mapping unit scales

The mean value of LSIs tends to decrease as the scale of study area decreases for grid units, whereas slope units show the opposite trend in accordance with the influences of study areas with different scales on the distribution rule of LSIs (Table 5). Take grid units with 60 m resolution as example (Fig. 12). In study areas with various scales, the mean values of the LSIs in the RF model are 0.342 in Ganzhou City, 0.264 in Ganzhou East, and 0.226 in Ruijin County. The standard deviation, however, exhibits the mean's opposite trend. When the scale of the study area is reduced for grid units, the mean value of the LSIs rises while the standard deviation falls. The trend for slope units is the opposite, and they are more accurate, have a higher mean value, and are more effective than grid units (Fig. 12).

7 Discussion

In general, the statistical results of big data in a larger study area scale should be more consistent with the actual situation because there are more landslide samples, and the results of LSP can better reflect the general rule(Araújo et al. 2022). LSP results under various scales using SVM model are set as the second case to demonstrate the generalizability of small study area scale and slope unit as mapping units with better LSP accuracy and smaller uncertainty in LSP modelling, and small study area scale can better reflect the specific law in the local range. This is because the accuracy on the resulting model is inconsistent due to the need for more detailed data for a larger study area scale (Kulsoom et al. 2023). Therefore, it is expected to provide higher modeling accuracy and effectiveness in large study areas scale with detailed landslide samples and diverse environmental factors (Gupta and Shukla 2023). For more accurate environmental factors and better LSP accuracy, the study area must be scaled appropriately at this stage (Pellicani et al. 2017). The importance of environmental factors and the accuracy of the LSP are discussed in this study from the perspectives of study areas at various scales.

7.1 LSP results under different scales using SVM model

For SVM modeling, SPSS modeler 18. 0 software is also used. The SVM's kernel function is chosen to be the popular radial basis kernel function (Huang and Zhao 2018). Debugging is used to determine the best regular parameter (C) and kernel parameter (γ), with the default values being applied to the other parameters. The trained SVM is used to acquire LSIs in the study area, and the spatial superposition analysis in ArcGIS is used to determine the distribution of LSIs in study areas with different scales. Additionally, the natural break point method categorizes LSIs into five classes: very low, low, moderate, high, and very high susceptibility zones.

7.1.1 Landslide susceptibility results under different study area scales

The LSMs of Ganzhou City, Ganzhou East, and Ruijin County in Ganzhou East using SVM models built on grid units with a 60 m resolution are considered as an illustration. Test results for landslide susceptibility at various study area scales revealed that as the study area's scale gradually shrank, so did the areas of the low- and very low-susceptibility zones. With a decrease in scale from Ganzhou to Ruijin County, the LSP accuracy of grid (60 m) units in different study area scales based on SVM model tends to increase (Fig. 14). Grid (60 m) units in Ruijin County that were extracted by masking Ganzhou City have a lower LSP accuracy than grid (60 m) units directly predicted in Ruijin County and extracted by masking Ganzhou East. Table 5 illustrates variations in the LSI distribution rule predicted by the RF and SVM models at various study area scales.

Fig. 14
figure 14

AUC values of grid (60 m) units in different study area scales based on RF and SVM model a Ganzhou City b Ganzhou East and Ganzhou East extracted from Ganzhou City c Ruijin County, Ganzhou East and Ganzhou City based on RF model d Ruijin County, Ganzhou East and Ganzhou City based on SVM model

7.1.2 Landslide susceptibility results under different mapping unit scales

The slope units in Ruijin extracted from Ganzhou City, Ruijin extracted from Ganzhou East, and Ruijin County under SVM models as examples, as well as the LSMs of grid units with 30 m and 60 m resolutions. The results of the SVM model's analysis of the susceptibility to landslides at various mapping scales revealed that the very low to moderate susceptibility zones in the same study area are smaller when based on slope units than when based on grid (30 m/60 m) units. In the same study area, the LSP accuracy based on the SVM model gradually improves from grid units (60 m) and grid units (30 m) to slope units (Fig. 15). The results in the RF model and the distribution rule of the LSIs based on the SVM model under mapping unit scales are shown in Fig. 13.

Fig. 15
figure 15

AUC values of different mapping units in Ruijin County a, c, e RF model b, d, f SVM model

7.2 Importance ranking of landslide related environmental factors

The correlation analysis tool in the SPSS 24.0 software is used to calculate the correlation coefficients of the 21 environmental factors. According to the findings, there are weak correlations between these environmental factors, as evidenced by correlation coefficients that are less than 0.5 and a significance level of less than 0.05 (Huang and Zhao 2018). Each environmental factor is also subject to a collinearity diagnosis. The multicollinearity issues among the environmental factors by removing highly correlated variables and estimating the variance inflation factor (VIF) by maintaining the threshold value (Zeng et al. 2023). A regression line was fitted between each predictor variable and the other predictor variables to obtain the VIF by measuring the square of multiple correlation coefficients (Achu et al. 2023). The variance inflation factor and tolerance are used to determine the degree of collinearity between these environmental factors, with a serious level of collinearity being indicated by a variance inflation factor greater than 5 and/or a tolerance lower than 0.1. The findings demonstrate that there is no multi-collinearity among these environmental factors, with the maximum value of the variance inflation factor and the minimum value of the tolerance being 4.35 and 0.29, respectively. Therefore, all the 21 environmental factors can be used in this study for LSP modelling. Meanwhile, some studies indicate that error levels in environmental factors have a great influence on the LSP results. For example, Huang et al. (2023a) found that the greater the proportion of random error levels in environmental factors, the greater the uncertainty of LSP results. These original continuous environmental factors are processed by eliminating the random errors using low-pass filter method.

7.2.1 Importance of environmental factors in different study area scales

The importance prediction of the SVM and the mean decreasing accuracy of the RF model are used to rank the significance of environmental factors at various study area scales. Figure 16a illustrates the mean decreasing accuracy in the RF model based on grid units with a 60 m resolution. In Ganzhou City, topography relief has the highest importance value of 0.12, followed by slope with a value of 0.08 and TWI with a value of 0.07; in Ganzhou East, topography relief has the highest importance value of 0.16, followed by surface cutting depth with a value of 0 16. In the SVM based on 60 m resolution, Fig. 16b illustrates the significance of environmental factors in each study area. In Ganzhou City, the slope is of the utmost importance with a value of 0.18, followed by the normalized differential building index with a value of 0.14, the road density, and drainage density with a value of 0.10; in Ganzhou East, the slope is of the utmost importance with a value of 0.13, followed by surface cutting depth with a value of 0.11, and drainage density with a value of 0.10; In Ganzhou East, slope has the highest importance with a value of 0.13, followed by surface cutting depth with a value of 0.11 and drainage density with a value of 0.10; In Ruijin City, terrain wetness indexes has the highest importance with a value of 0.25, followed by the elevation variation coefficient with a value of 0.21, average annual rainfall with a value of 0.16, road density with a value of 0.09, faults with a value of 0.07 and lithology with a value of 0.06.

Fig. 16
figure 16

Environmental factors importance under different study area scales and 60 m grid units a RF b SVM

7.2.2 Importance of environmental factors in different mapping-units scales

While it varies significantly from the 60 m resolution grid unit, the importance of environmental factors in the 30 m resolution grid unit and the slope unit tends to be consistent to some extent. Figure 17a, using the RF model as an illustration, demonstrates that plan curvature, which has the highest importance value of 0.30, is followed by drainage density, which has a mean value of 0.15, and TWI, which has a mean value of 0.08 at various study area scales based on grid units with 30 m resolutions. Meanwhile, plan curvature, drainage density, and TWI have respective mean values of 0.38, 0.20, and 0.08 at various study area scales Fig. 17b depicts that there is no difference in the importance of environmental factors at different study area scales based on the grid units with 60 m resolutions. In Ganzhou City, topography relief, slope, and TWI have importance values of 0.12, 0.08, and 0.07, respectively, while surface cutting depth, slope, and elevation variation coefficient have importance values of 0.10, 0.08, and 0.07, respectively.

Fig. 17
figure 17

Environmental factors importance under different study area scales and different mapping unit scales based on RF a Grid units (30 m) b Slope units

7.3 Discussions about influence of different scales on LSP modelling

7.3.1 Influence of study area scales on importance of environmental factors

The weights of environmental factors in study areas with various scales differ obviously when the same data-driven model and the same environmental factors are used in LSP modeling (Huang et al. 2021b). It is easy to cause the average phenomenon of environmental factors in the local area or ignore the local characteristics when the study area scale is very large. Data mining or statistical rules are prone to distortion with the increase of the study area scale. However, it will be more closely matched with local landslide characteristics. The results also prove that the greater the diversity of environmental factors, the more LSP accuracy of data-driven model. The weight of environmental factors with slow spatial variation, such as lithology and land cover, becomes minimal for the LSP in Ruijin County (Hurlimann et al. 2022). The importance of environmental factors is significantly different, and the weights of the terrain wetness index and elevation variation coefficient increase, which can lead to better LSP results.

However, in a large-scale study area like Ganzhou City, the weights of the terrain, road density, drainage density, and normalized differential building index increase while the DEM variation and surface roughness decrease, and the significance of environmental factors tends to be homogeneous. The scale of the study area directly affects how important landslide environmental factors are ranked in LSP modeling in study areas of various scales (Salciarini et al. 2016). The significance of environmental factors exhibits the phenomenon of "averageness" to take into account the characteristics of landslide evolution in a larger study area as the scale of the study area gradually increases (Grandjean et al. 2018). The differences between landslides and non-landslides in the training samples become more pronounced as the size of the study area shrinks, and there are more significant environmental factors or combinations of them that play a significant role in the evolution of landslides, which can produce better LSP results faster (Guo et al. 2022).

The importance of landslide environmental factors is directly impacted by the scale of the study areas, which shifts the environmental factors that were initially used to represent Ganzhou City's susceptibility in the surrounding areas of Ganzhou East and Ruijin County. The importance of environmental factors varies at this time, which helps to better reveal the spatial variation in LSP (Palau et al. 2022).

7.3.2 Influence of mapping unit scales on importance of environmental factors

A mapping unit, at the level of the analysis, is a geographical region that captures environmental factors (Liu 2022). The initial selection of a suitable mapping unit that can express information about environmental factors is necessary for LSP modeling (Kedron and Holler 2022). Different mapping unit scales are represented by grid units with spatial resolutions of 30 and 60 m, as well as slope units that were extracted using the multi-scale segmentation method from study areas with various scales. Secondly, it has been discovered that mapping unit scales significantly affect how environmental factors are expressed. Plan curvature, drainage density, and TWI weights based on slope units and grid units with 30 m spatial resolutions increase as study area scales change. The importance of environmental factors under grid units with 30 m spatial resolutions and slope units based on RF and SVM models tends to be somewhat consistent. The importance of environmental factors in various study areas has also been discovered to differ significantly between grid units with 30 m resolution and 60 m resolution. Based on grid units with a resolution of 60 m, environmental factors are equally important and clearly regulated at various study area scales. The excellent slope units make up for this since the grid resolution directly affects how important environmental factors are at various study area scales. The study of the mapping unit scale seeks to strike a balance between the maximum possible accuracy of LSP and the maximum possible efficiency of environmental factor expression. It also aims to determine the best way to make use of the mapping units' effect in order to maximize LSP's effectiveness.

In study areas using various scales and/or models, the differences in the significance of environmental factors are generally significant. Plan curvature, slope, and DEM significantly contribute to the RF and SVM models for Ruijin County, and plan curvature, terrain wetness indices, and drainage density significantly contribute to the models for Ganzhou East and Ganzhou City. The importance of environmental factors varies greatly between the different mapping units, but it is generally consistent in the slope unit and the 30 m resolution grid unit. Additionally, there is a gradual decline in the differences in the weights given to various environmental factors from Ruijin County to Ganzhou East and Ganzhou City, indicating that there is a general trend toward an "average" weighting of environmental factors from Ruijin County to Ganzhou East and Ganzhou City.

7.3.3 Influence of study areas scales on LSP accuracy

A challenging issue that significantly hinders the applications of machine learning model to large scale areas is the data incompleteness in most landslide inventories, particularly the lack of detailed and accurate landslide inventory data that is a vital link between each landslide and its environmental factors (Sun et al. 2023; Xiao and Zhang 2023). Hence, we have to reduce the study area scale in exchange for the prediction accuracy of the local region (Kulsoom et al. 2023; Moeen Hamid Bukhari et al. 2023).The Ganzhou City landslide inventory data and 21 landslide environmental factors are used in modeling to produce the LSP results. In order to obtain LSP results, Ganzhou City also extracts the landslide and environmental factor data for Ganzhou East and Ruijin County. A smaller scale of the study area results in a higher LSP accuracy, and a lower mean value of the LSIs indicates a better LSP performance under the same mapping unit scale and environmental factors with three different scales of the study area (Deng et al. 2022; Kirschbaum et al. 2011). The grid units in the RF (SVM) model have AUC accuracies of 0.848 (0.822), 0.856 (0.839), and 0.884 (0.873) in Ganzhou City, Ganzhou East, and Ruijin County, respectively.

There is a sizable difference in LSP accuracy when small-scale areas are extracted from larger-scale areas using the same data-driven model and the same environmental factors. Ruijin County between direct prediction and mask extraction from Ganzhou City and Ganzhou East is taken as an illustration. The corresponding mean values of LSIs in the RF (SVM) model are 0.368 (0.361), 0.325 (0.333), and 0.279 (0.280), with an increasing standard deviation trend. The AUC values of grid units with resolution of 30 m in the RF (SVM) model are 0.881 (0.848) in Ruijin extracted from Ganzhou City, 0.880 (0.854) in Ruijin extracted from Ganzhou East, and 0.918 (0.882). The LSIs in Ruijin County under the three different study area scales differ significantly, and the LSIs with direct prediction in Ruijin County are more trustworthy. Additionally, the accuracy, LSI distribution rule, and significant difference in study areas with various scales in Ganzhou East are identical to those in Ruijin County. This demonstrates that as the scale of the study area decreases, the mean value of the LSIs decreases, and the standard deviation tends to increase, the LSP accuracy of grid units in study areas with different scales increases. The LSP accuracy of slope units in study areas with various scales is superior to grid units and has little effect on scale reduction (Alvioli et al. 2022; Huang et al. 2021a).

In order to increase the number of samples and lower the LSP's level of uncertainty, the study area was enlarged from Ruijin County to Ganzhou City. The findings show that, despite an increase in the number of landslide samples due to the study area expansion, the rule of spatial heterogeneity in geography is broken by incomplete sample information (Yu et al. 2022). On the other hand, the prediction performance is better in the smaller study area.

7.3.4 Influence of mapping unit scales on LSP accuracy

The accuracy and efficiency of slope units divided by high-resolution grid data are less affected by the scale of the study area. Whereas the trend for grid units is the opposite, the LSP accuracy of grid units increases by about 5% as the study area's scale decreases (Liu et al. 2023). In the RF (SVM) model, grid units with a resolution of 60 m have AUC accuracies of 0.848 (0.822), 0.856 (0.839), and 0.884 (0.873), respectively, while grid units with a resolution of 30 m have AUC accuracies of 0.915 (0.875), 0.888 (0.861), and 0.918 (0.882), respectively. Additionally, with an increasing standard deviation, the mean LSI values for grid units with a resolution of 30 m (60 m) in the RF model are 0.342 (0.346) in Ganzhou City, 0.264 (0.304) in Ganzhou East, and 0.226 (0.243) in Ruijin County. The AUC accuracies of slope units are better than grid units with resolution of 30 m and 60 m, with values of 0.947 (0.932) in Ganzhou City, 0.928 (0.889) in Ganzhou East, and 0.936 (0.934) in Ruijin County. The LSP accuracy of slope units in study areas with various scales is superior to grid units and has little effect on scale reduction (Alvioli et al. 2022; Huang et al. 2021a).

7.4 Future study prospects

In conclusion, a smaller study area scale from Ganzhou City to Ruijin County improves prediction accuracy. To better reveal the spatial variability of geographical phenomena, the next step is to study the township level, taking into account regional variations in towns and counties, shifting the environmental factors originally used to reflect global relationships to reflect each township area, and comparing the scale effect of township area (Gariano et al. 2017; Palau et al. 2022; Yu et al. 2022). To produce a regional LSM that is accurate to the objective reality, it is important to select an appropriate study area scale that takes into account both the spatial correlation and spatial heterogeneity of the geographic environment. Applying the combination of multi-scale segmentation and physically-based numerical modeling to suitable study area scales for disaster risk management will require additional work. The resolutions of 30 m and 60 m are representative for the grid units in various study area scales. The uncertainty rules of LSP under grid units with 30 m and 60 m resolution can be well compared with those under the slope unit. Additionally, it is important to investigate the uncertainty rule of LSP modeling at various spatial resolutions, including mapping units with 10 m, 15 m, 30 m, 60 m, 90 m and 120 m resolutions.

8 Conclusions

In the present study, the effects of mapping unit scales and study area scales on the uncertainty rules of LSP are introduced. The study areas are Ganzhou City, the Eastern portion of Ganzhou City (Ganzhou East), and Ruijin County in Ganzhou East. Different mapping unit scales are represented by grid units with spatial resolution of 30 and 60 m, as well as slope units that were extracted using multi-scale segmentation method. The effects of combined consideration of the study areas at various scales on the uncertainty of LSP modeling are discussed. These are the conclusions:

  1. (1)

    LSP accuracies under grid units significantly increases as the study area scales decrease from Ganzhou City to Ruijin County, whereas the accuracy of LSP under the slope units is less affected by study area scales. However, the standard deviation of LSIs shows the opposite trend, the mean values of LSIs in Ruijin County that were extracted by mask from Ganzhou City are higher than those that were extracted from Ganzhou East and those that were directly predicted. Considering that a larger study area can provide more abundant model training and testing landslide samples, more studies are needed in research fields of different study area scales.

  2. (2)

    In the study areas with various scales, LSP modelling under slope units is more accurate, stable, and efficient than those under grid units with 30 m resolution and 60 m resolution. Slope units outperform grid units in terms of efficiency and mean value.

  3. (3)

    The environmental factors displayed an averaging trend at the larger study area scale, and their importance changes significantly with the decrease of the study area scales. The importance of environmental factors varies greatly between the different mapping units, but it is generally consistent in the slope units and the grid units with 30 m resolution.