Keywords

1 Introduction

The challenge of ensuring the sustainable development of residential communities in the context of rapid urbanization is crucial for developing countries, including Vietnam. With 75% of the territory comprised of mountainous terrain, the frequency of sediment-related natural hazards has increased in recent years (Lan Huong et al. 2022). Consequently, it is necessary to identify hazard zones by producing integrated maps based on the current situation and available materials. These maps include data on the spatial distribution of hazard zones and their temporal evolution. These data are essential for evaluating potential risks, long-term territorial planning, and land use planning.

Compared to other sediment-related disasters, landslides are prevalent in Vietnam, particularly in the northern mountainous regions, and negatively impact the sustainable development of local communities (Thanh Thi Pham et al. 2020). Therefore, numerous studies on landslides have been conducted in these regions (Bien et al. 2022; Hung et al. 2017). According to previous studies, most landslides were triggered by precipitation, and the increase in landslide frequency has been associated with human activities (e.g., road construction) (Do et al. 2022; Tien Bui et al. 2017). When developing models for predicting landslides, determining the relationship between conditioning factors and the distribution of landslides is crucial (Liao et al. 2022). The evaluation results may validate the correctness of the selection of analysis factors and play a significant role in the performance of the prediction models.

This paper presents the results of the fractal-based evaluation of the spatial relationship between conditioning factors and the distribution of landslides in Tinh Tuc town, Cao Bang province, Vietnam. The spatial relationship was evaluated using three raw data sets, including the number of landslides, relative landslide density, and landslide density. By producing landslide susceptibility zonation maps, this study determined that all three data sets are suitable for assessing the relationship between landslide occurrences and conditioning factors. In addition, this study determined that the prediction model based on relative landslide density and landslide density provided a higher performance in analyzing the relationship between landslides and conditioning factors.

2 Study Area

Cao Bang province (6690 km2; 22° 21’ N to 23° 07′ N and 105° 16′ E to 106° 50′ E), which is located in the northeastern mountainous region of Vietnam (Fig. 1), has experienced numerous natural disasters in recent years, especially landslides, due to the influence of various natural, environmental, and social conditions. Nguyen Binh is a mountainous district in the southern portion of Cao Bang province that is determined to be one of the areas with a very high likelihood of landslides (VIGMR 2017).

Fig. 1
A map highlights the location of the Cao Bang province in the northern part of Vietnam. The map of Cao Bang province highlights the study area. The elevation map of the study area traces the landslide locations. The elevation ranges from 434 to 1876 meters.

Location of study area

Tinh Tuc town is 18 km from the center of Nguyen Binh district and is situated on National Highway 34, which connects Cao Bang province and Ha Giang province (Fig. 1). The study area covers a total of 66.76 km2, including Tinh Tuc town and its surroundings. The elevation of the study area ranges from 434 m in the valley to 1876 m in the mountainous terrain, where Phia Oac peak reaches 1932 m (Kien et al. 2021). The studied territory is characterized by geological formations and complexes composed of igneous and sedimentary rocks dated from the Paleozoic to the Quaternary period. The weathering process has contributed to the growth of two main types of weathering crust in the study area: ferosialite (86%) and sialferite (14%). In 2020, a failure volume of 2500 m3 was recorded at Highway 34, Km 192 + 500, on the 3–4 m-thick weathering crust formed from the rocks of the Song Hien formation (Fig. 2a). Another landslide occurred in 2016 at Km 200 + 300 on Highway 34. The weathering crust, with a thickness between 3 and 4 m, formed from the Cao Bang Complex rocks, has produced a sliding mass with a volume of 50,000 m3 (Fig. 2b). Heavy rainfall in a short period of time or prolonged rainfall events have been identified as the trigger of landslides in the study area.

Fig. 2
Two photographs of landslide areas beside concrete highways. The landslide has eroded soil sliding on the slope from the highways.

Landslides at the national highway 34 km 192 + 500 (a), and at Km 200 + 300 (b). Photo by T. K. Nguyen

3 Evaluation of the Spatial Relationship Between Conditioning Factors and the Distribution of Landslides

3.1 Methods

Mandelbrot (1967), who initially introduced the fractal theory in his study of the British coastline, determined that the degree of complication of a geographic curve may be characterized by a “dimension” D. Since then, the fractal method has been improved and extensively utilized in numerous fields of research, including medicine (Tanabe et al. 2020), muscle activity patterns during locomotion (Santuz and Akay 2020), materials science (Paun et al. 2022), the characterization of absorption capacity, porosity, and surface area (Elejalde-Cadena and Moreno 2021), natural hazards (Cello and Malamud 2006), etc.

In the landslide susceptibility study, the fractal method is employed to determine the weights of the conditioning factors, therefore indicating the role of each conditioning factor in the occurrence of landslides in the studied territory. The method may be characterized as a power law described by Eq. (1):

$$ p(r)\propto C.{r}^{-D} $$
(1)

where r is the feature measured scale, p is the measured value under the corresponding scale r, C is a constant value, and D is the fractal dimension (Hu et al. 2020). Fractal analyses provide a raw data set (P) based on several variables, such as the number of landslides, relative landslide density, and landslide density. The cumulative sum (S) values are then determined using the formula (2):

$$ \left\{P\right\}=\left\{{P}_1,{P}_2,\dots, {P}_n\right\} $$
$$ \left\{{S}_1\right\}=\left\{{P}_1,{P}_1+{P}_2,\dots, {P}_1+{P}_2+\dots +{P}_n\right\} $$
$$ \left\{{S}_2\right\}=\left\{{S}_{11},{S}_{11}+{S}_{12},\dots, {S}_{11}+\dots +{S}_{1n}\right\}\dots \dots \dots $$
(2)

The fractal analysis then constructs the (S, r) graphs and performs a linear fitting to determine the D value for each condition factor. By using the formula (3), the weight of each landslide conditioning factor (Wi) is calculated:

$$ {W}_i={D}_i/{\sum}_{i=1}^n{D}_i $$
(3)

In this study, the Frequency ratio (FR) was integrated with the Fractal method to produce partition landslide susceptibility zonation (LSZ) maps to examine the efficacy of spatial relationship evaluation using different raw data sets. The FR method evaluates the relationship between the distribution of landslides and the subclasses of conditioning factors (Wang et al. 2020) based on the formula (4):

$$ {FR}_i=\%{LS}_i/\%{A}_i $$
(4)

where FRi is the FR value of subclass i, %LSi is the percentage of landslides in subclass i, and %Ai is the area percentage of subclass i. The calculated FR value is then normalized to the weight of the factor class (NFR) according to the formula (5):

$$ {NFR}_i={FR}_i/{\sum}_{i=1}^n{FR}_i $$
(5)

Finally, the landslide susceptibility index (LSI) is calculated according to the formula (6):

$$ LSI={\sum}_{i=1}^n{NFR}_i\times {W}_i $$
(6)

3.2 Spatial Relationship Between Conditioning Factors and Landslide Distribution

For analyzing the spatial relationship, this study selected four landslide conditioning factors, including distance to roads, distance to faults, distance to drainage, and distance to geological boundaries. Downloaded OpenStreetMap data from Geofabrick was used to produce a map of the distance to roads in the study area (Fig. 3a). Along with a 12.5-m-resolution digital elevation model (DEM), data on geological units and fault systems were provided by the Institute of Geological Sciences, Vietnam Academy of Science and Technology. Geologically, the study area has six geological units, including the Bac Son formation, Dong Dang formation, Song Hien formation, Quaternary deposits, Cao Bang complex, and Pia Oac complex. Afterward, these data were used to prepare additional factor maps, including distance to faults (Fig. 3b), distance to drainage (Fig. 3c), and distance to geological boundaries (Fig. 3d). Each factor was divided into subclasses for quantifying the relationship between conditioning factors and landslide distribution. Distance to roads, distance to faults, and distance to geological boundaries were subdivided into 11 subclasses, while the distance to drainage was subdivided into seven classes, as shown in Table 1.

Fig. 3
Four color-coded maps, a to d, plot the areas with respect to the distance to roads, faults, drainage, and geological boundaries, respectively. The landslides are represented by dots throughout the map.

Maps of distance to roads (a), distance to faults (b), distance to drainage (c), and distance to geological boundaries (d)

Table 1 Spatial distribution analysis of landslides using FR method

Analysis of the spatial relationship between conditioning factors and past landslides revealed that most landslides occurred within 100 m of roads and 200 m of drainage systems. This result demonstrates that the construction of the road system has altered the natural slope, resulting in an imbalance that increases the probability of landslides. Since precipitation is the main trigger of landslides in the study area, landslides frequently occur along drainage systems, where soil and rock are highly saturated because of rainwater infiltration. However, the distribution of landslides within the classes of the remaining two factors is relatively equal.

Figures 4, 5, and 6 represent the outcomes of the fractal-based analysis. The Ln(S)-Ln(r) plots were linearly fitted with a correlation coefficient of R2 > 0.995, as indicated in the figures. Similar to the results of the previous study (Zuo and Carranza 2017), these graphs demonstrate a non-linear spatial relationship between the number of landslides or landslide density and distance to conditioning factors. By utilizing a data set based on the number of landslides (case study 1), fractal analyses have shown that the distance to drainage is the most influencing factor in the formation of landslides (D = 2.0088). However, using datasets based on relative landslide density (case study 2) and landslide density (case study 3), the analysis results indicated that the distance to geological boundaries has the most significant role in the landslide process. The remaining factors (distance to faults, distance to drainage, and distance to roads) have a less significant effect on the landslide process. Based on the formula (3), the D values are then used to calculate the weighted values of the factors (W). The results of the calculations are presented in Table 2.

Fig. 4
4 line graphs, a to d, plot L n of S versus L n of r, for S 2 of r, S 2 of r, S 3 of r, and S 2 of r, respectively. The line extends between the following estimated values. A. From (0, 3.8) to (1.3, 5). B. From (0, 2.5) to (2.4, 6). C. From (0, 3) to (1.4, 6). D. From (0, 2.3) to (2.4, 6).

Fractal analysis using the number of landslides. Graphs of factors: distance to roads (a), distances to faults (b), distance to drainage (c), and distance to geological boundaries (d)

Fig. 5
4 line graphs, a to d, plot L n of S versus L n of r, for S 2 of r. The line extends between the following estimated values. A. From (0, 1.3) to (1.3, 2.6). B. From (0, 1.3) to (2.4, 5). C. From (0, 0.3) to (1.3, 2.3). D. From (0, 0.8) to (2.4, 4).

Fractal analysis using relative landslide density. Graphs of factors: distance to roads (a), distances to faults (b), distance to drainage (c), and distance to geological boundaries (d)

Fig. 6
4 line graphs, a to d, plot L n of S versus L n of r, for S 2 of r. The line extends between the following estimated values. A. From (0, 3.8) to (1.3, 5). B. From (0, 2.5) to (2.4, 6). C. From (0, 3) to (1.4, 6). D. From (0, 2.3) to (2.4, 6).

Fractal analysis using landslide density. Graphs of factors: distance to roads (a), distances to faults (b), distance to drainage (c), and distance to geological boundaries (d)

Table 2 Weights of the factors calculated from fractal analysis

Using a hybrid fractal-frequency ratio (FFR) model, LSZ maps were produced to compare various data sets for spatial relationship analysis. This model utilizes the results of calculating factor class weights using the frequency ratio method and factor weights using the fractal method. The FFR1 model employs the analysis results of case study 1, while the FFR2 model reflects case studies 2 and 3. A receiver operating characteristic (ROC) analysis was conducted to assess the performance of these models.

3.3 Model Validation

In this study, the well-known ROC method was employed to assess the accuracy of the prediction maps produced by the two models. A ROC curve is a curve in the unit square formed by plotting sensitivity (or True Positive Rate—TPR) versus specificity (1-FPR, FPR—False Positive Rate) at different thresholds (Yu et al. 2021). It is widely accepted that a higher and more leftward-positioned ROC curve in the ROC space indicates better prediction performance (Metz 1978). Based on the landslide inventory map, which included 47 landslides, ROC analyses were conducted to determine the AUC-ROC (area under the ROC curve) value.

3.4 Results of Landslide Susceptibility Using Fractal-Frequency Method (FFR)

The LSI values for the study area were calculated using Eq. (6) to produce landslide susceptibility index maps. The LSI values range from 0.006 to 0.363 (FFR1) and 0.006 to 0.365 (FFR2). Based on the landslide susceptibility index (LSI) maps, we prepared the landslide susceptibility zonation (LSZ) maps in ArcGIS using the Jenks natural breaks classification method. The study area was divided into five landslide susceptibility zones: very low, low, moderate, high, and very high (Figs. 7 and 8). The LSZ maps produced by the two models have similarities when determining the spatial distribution of susceptibility zones. Most high and very high susceptibility zones were identified at low to moderate elevations and are dominated by road and drainage networks. It has also been shown that the fault system and geological boundaries in the study area affect the spatial distribution of these zones.

Fig. 7
A color-coded L S Z map plots the areas based on very low, low, moderate, high, and very high categories. The landslides are represented by dots throughout the map.

LSZ map using FFR1 model

Fig. 8
A color-coded L S Z map plots the areas based on very low, low, moderate, high, and very high categories. The landslides are represented by dots throughout the map.

LSZ map using FFR2 model

The proportion of each landslide susceptibility zone predicted by the two models is shown in Fig. 9. As seen in Fig. 9, all susceptibility zones to landslides were assessed similarly by all two models. The very high susceptibility zone predicted by the two models is 24.34 (FFR1) and 25.52% (FFR2). In addition, the very low landslide susceptibility zones assessed by these models are 17.91% and 17.66%, respectively.

Fig. 9
A grouped bar chart of the percentage of susceptibility zones versus susceptibility zones. The values are plotted as zone, F F R 1 and F F R 2. Very low, 17.91 and 17.66. Low, 21.37 and 21.03. Moderate, 21.19 and 20.1. High, 15.18 and 15.69. Very High, 24.34 and 25.52.

Distribution of susceptibility zones for each model

By using 47 past landslides in the study area, ROC curves depicting the performance of two models were constructed (Fig. 10). In Fig. 10, the AUC values and ROC curves indicated very good model performance in predicting the spatial distribution of landslides (AUC > 80%) (Swets 1988). All models revealed a relationship between landslide conditioning factors and the distribution of historical landslides in the study area, as shown by an analysis of the success rate curves. Generally, the area under the curve (AUC) values of the success rate curves for the FFR1 and FFR2 models are almost identical. Simultaneously, the analysis outcomes revealed that the FFR2 model performed slightly better than the FFR1 model, indicating strong model performance and prediction accuracy.

Fig. 10
A line graph of sensitivity versus 1 specificity plots curves for A U C of F F R 1 = 89.59% and A U C of F F R 2 = 89.64%. Curve 1 starts at (2, 15), increases, and ends at (85, 100). Curve 2 starts at (2, 15), increases, and ends at (100, 100). Values are estimated.

ROC curves with AUC values of two models

According to the study results, all three raw data sets may be utilized to analyze the relationship between landslides and conditioning factors. However, the model using landslide density-based data provides improved prediction performance. This result may be explained by the fact that the landslide density provides a stronger spatial relationship between the distribution of landslides and the area of the factor subclasses. This difference may be defined more clearly if the study is conducted in a region with a significant number of landslides, along with conditioning factors selected depending on data availability, features of landslides, and previous studies in the study area.

4 Conclusions

In landslide hazard and susceptibility studies, the efficacy of assessing the relationship between past landslides and selected factors is crucial to the accuracy of the produced maps. Therefore, this study was conducted to determine the effect of several factors on the distribution of landslides in Tinh Tuc town, Cao Bang province, Vietnam. Three raw data sets were generated using the number of landslides and landslide density. By utilizing the frequency ratio method and the fractal method, the role of subclasses and factors, as expressed by NFR and W values, was evaluated. Afterward, two landslide susceptibility maps were produced based on the selected data sets. The similarity of the susceptibility zone predicted by the two models and the outcomes of the ROC analysis revealed that the data sets are suitable for producing highly accurate landslide susceptibility maps. It is suggested to apply the fractal method in landslide susceptibility studies because it is possible to quantify the influence of conditioning factors, thereby improving the accuracy of landslide prediction.