Direct sampling
DS is part of the multiple-point geostatistics (MPS) family of techniques (Mariethoz et al. 2010; Mariethoz and Caers 2014), which simulate a random variable at unknown locations by generating data patterns similar to the ones observed in a given training image (TI) (Strebelle 2002; Caers and Zhang 2004; de Vries et al. 2008; Vannametee et al. 2014). A TI can be a real dataset or a conceptual image of the expected spatial heterogeneity based on prior information (Meerschman et al. 2013). In their pioneer study, Vannametee et al. (2014) showed the applicability of MPS to map 8 landform classes in the French Alps, using the pioneer MPS algorithm SNESIM (Strebelle 2002). With respect to early MPS algorithms, DS can consider both continuous and categorical variables at the same time, which allows using different types of predictor variables.
The DS algorithm generates a random variable on a simulation grid (SG), representing the study zone, by resampling the TI under pattern-matching constraints and calculating the distance \(D(\overrightarrow{d}\left(X\right), \overrightarrow{d}\left(y\right))\), i.e. the measure of dissimilarity, between two data events (for more details see Oriani et al. 2021).
$$D\left( {\vec{d}\left( X \right), \vec{d}\left( y \right)} \right) = N^{ - 1} \mathop \sum \limits_{n = 1}^{N} {\mathbb{I}}_{{d_{n} \left( x \right) \ne d_{n} \left( y \right)}}$$
(1)
where \({d}_{i}\left(\cdot \right)\) is the nth datum which composes the conditioning pattern.
In the present case, a categorical variable denoting geomorphological classes is the target variable, manually defined for the TI by geomorphological expertise. A series of morphometric, physical, and remotely sensed variables, defined for both the TI and SG, are provided as predictors for the geomorphological classes. The algorithm identifies correspondences between patterns of these variables, then it sequentially imports in the SG the target variable values (i.e. the geomorphological classes) associated with the most similar patterns found in the TI.
Since DS is a geostatistical simulation algorithm, it does not produce a unique classification, but a (possibly infinite) number of equiprobable scenarios of classes, called realizations. The most probable estimation of the classes can be performed by computing the mode of the realizations (i.e. the most frequent class across all the realizations). In addition, the variability between realizations can be analyzed to estimate the classification uncertainty.
The following DS parameters have to be defined: (1) the maximum fraction of the TI to be scanned F [0, 1]; (2) a neighborhood including the number of neighbor pixels to each target; (3) the distance threshold position T [0, 1], used to stop or continue the sampling processes if a data event is found in the TI; (4) the number of realizations; (5) the weight for the conditioning data W [0, 1]. In our case, we decided to completely scan the TI (F = 1) to have access to all the patterns in the training image, with a neighborhood defined as the 9 closest pixels for each predictor, except for the geomorphology variables for which we do not consider spatial neighbors. In the simulation, patterns are compared with a rotation-invariant distance to increase the matching possibilities (Mariethoz and Kelly 2011). The threshold position T was set to 0.01 in agreement with Meerschman et al. (2013); 100 realizations were generated and all the variables were given the same weight (W = 1).
Random forest
RF is an ensemble-learning algorithm for classification and regression based on decision trees (Breiman 2001). As a common characteristic of machine learning based approaches, RF is capable of learning from and makes predictions on data, modeling the hidden relationships between a set of input and output variables. Decision trees are supervised classifiers providing decisions at multiple levels and are constituted by root nodes and child nodes. At each node, decisions are taken based on training predictor variables. The number of generated trees (ntree) and the number of variables randomly sampled as candidates at each split (mtry) are the only parameters that need to be specified by the user. The algorithm then generates ntree subsets of the training dataset (counting about one-third of the observations) by bootstrapping (i.e. random sampling with replacement). For each subset, a decision tree is generated and, at each split, the algorithm selects randomly mtry variables and computes the Gini index to select the best variable. This step is iterated until each node contains only one or less than a pre-fixed number of data points. The prediction of a new data point is finally computed by taking the average value of all decision trees for regression and the maximum voting for classification, which is the case in the present study. The parameters of the model have been optimized by evaluating the prediction error on those observations that were not used in the training subsets (called “out-of-bag” – OOB). Values were set to 500 for ntree and 4 for mtry, following a trial and error process. Finally, the relative importance of each variable was assessed by evaluating the mean decrease accuracy, computed by measuring how much the tree nodes using that variable enable reducing the mean square errors estimated with the out-of-bag, across all the trees in the forest.
RF was run twice: firstly with the same input dataset (extracted from the TI) as the one used for DS, and secondly with a balanced dataset. This strategy was adopted because the geomorphological classes are not equally represented in terms of number of pixels per class. We used the SMOTE (Synthetic Minority Over-sampling Technique) function (Chawla et al. 2002), which allows balancing the dataset by artificially generating new examples of the minority classes and by under-sampling the examples of the majority class. The level of over-sampling of the minority classes (perc.over) and of under-sampling of the majority classes (perc.under) need to be set up by the user, as well as the number of nearest neighbors (k) used to generate the new examples of the minority class. In our case, based on trial and error process, we set them as: perc.over = 900, perc.under = 900 and k = 5. In both the runs, unbalanced and balanced, RF was trained on the TI and results predicted on the SG. This selection of the training and the testing dataset (i.e. corresponding to the TI and SG respectively) allowed comparing RF and DS in identical conditions.
Analyses were performed using R free software (R Core Team 2019). Specifically, the packages randomForest was employed for the classification procedure and the package DMwR to balance the input dataset (with the function SMOTE).
Study area and experimental design
The study area corresponds to a rectangular domain of 70 km2 in the Arolla valley, located in the southwest Swiss Alps (46° 01’ N, 7° 28’ E) (Fig. 1). We selected this area because a classical geomorphological map is already available and can be used for validation (Lambiel et al. 2016). This map was elaborated using the geomorphological legend established by the University of Lausanne (Schoeneich 1993) and employed in several cases (e.g. Ondicol 2009). It highlights the process categories, the morphogenesis of the landforms and their activity rate. The selected rectangular domain was divided in two equal areas: one used for training (TI) and the second for simulation/testing (SG) the two data-driven algorithms (DS and RF).
Arolla valley is located in the upper part of the Hérens valley, a south-north catchment on the orographic left side of the Rhone River, ranging from 470 to 4357 m a.s.l. Geologically, this valley consists of oceanic sediments and orthogneisses, metagabbros and breccias (Steck et al. 2001). According to the Köppen-Geiger climate classification (Peel et al. 2007), the climate is considered ET (tundra climate) with a mean annual precipitation of 736 mm recorded at the Evolène-Villa weather station (1825 m a.s.l.) for the norm period 1981–2010. The 0 °C isotherm is around 2600 m a.s.l.
Arolla valley is characterized by the presence of several glaciers retreating since the end of the Little Ice Age (nineteenth century), large moraines, widespread periglacial landforms (e.g. active and relict rock glaciers, solifluction lobes), talus slopes and associated debris flows landforms (gullies, fans–Lambiel 2021).
The dataset is composed of 13 variables (Table 1): the geomorphological classes, representing the target variable, and 12 predictor variables, including topographical and remote-sensing indicators. The geomorphological classes are informed in the TI. Conversely, in the SG the target variable is uninformed and simulated by the classification algorithms (Fig. 2).
Table 1 Variables in the dataset. The orthomosaics and the original DEM were provided by from the Swiss Office of Topography All the variables were processed under a GIS environment (ArcMap 10.7) and resized on a regular grid with a spatial resolution of 20 m. Flow accumulation and roughness were computed using the TopoToolbox implemented in Matlab (Schwanghart and Kuhn 2010; Schwanghart and Scherler 2014). The aspect was transformed from degrees to sine (aspect_sin) and cosine (aspect_cos) to highlight all cardinal points.
Geomorphological classification
The original geomorphological map was organized in 11 main classes, grouping more than 100 types of landforms. In the study area, the classes karstic, lacustrine and organic were highly underrepresented and too scarce to be detected by a data-driven classification algorithm. Thus, pixels of these three classes were aggregated with the neighboring pixel of the eight main process-based classes. The anthropic class was excluded from the analysis. The final classification, based on the geomorphological interpretation of the original map performed by the authors, is shown in Table 2.
Table 2 Geomorphological classification Model validation
The predictions made on the SG and resulting from the implemented models were compared with the original geomorphological map (i.e. the observed class) through a confusion matrix (Table 3). This allowed evaluating the performance for each class and computing the overall accuracy and Kappa value (Cohen 1960).
Accuracy is the first evaluation statistic, defined as the ratio of the number of correct predictions over the total predictions:
$$Accuracy = \frac{TP + TN}{{TP + TN + FN + FP}}$$
(2)
Cohen’s Kappa is a measure of agreement normalized at the baseline of random chance on the dataset:
$$k = \frac{{p_{o} - p_{e} }}{{1 - p_{e} }}$$
(3)
where po is the observed accuracy and pe is the probability of chance agreement under independence assumption. Kappa values ranges between -1 and + 1, with negative values indicating a complete disagreement among predictions and observations, and positive values an agreement evaluated as slight (0.01–0.20), fair to moderate (0.21–0.60), substantial to almost perfect agreement (0.61–0.80) (Viera and Garrett 2005). The Cohen’s Kappa is generally seen as more informative than the accuracy.
The following evaluation statistics for each class have also been calculated:
$$Sensitivity = \frac{TP}{{TP + FN}}$$
(4)
$$Precision = \frac{TP}{{TP + FP}}$$
(5)
Sensitivity measures how often the model correctly assigns a geomorphological class over all the positive observations, and assesses the performance of the model to predict the presence of a geomorphological class when that class is present. Precision is the proportion of geomorphological classes correctly predicted over all the positive predictions.
For a multiclass system, the confusion matrix allows evaluating whether each single class is correctly predicted and to assess the degree of misclassification. This is accomplished by computing the fraction of pixels of class \(i\) being labelled as class \(j\). Therefore, the matrix diagonal shows the fraction of pixels correctly predicted for each class (corresponding to the sensitivity), while values outside the diagonal represent fractions of misclassified pixels.
In the result section, the DS data with one realization and the mode of 100 realizations are displayed, to highlight how computing the mode improves the results, also from a visual point of view. Furthermore, the probability of each class, computed based on the 100 realizations, is calculated to quantify the precision. For RF, the unbalanced (i.e. the original dataset) and balanced dataset are shown both as categorical values (by taking the maximum vote), and as probabilities (by normalizing the most voted class over the total number of trees).
Experimental variograms (Cressie 2015) were computed for the results of each method to evaluate the degree of spatial dependence of the geomorphological classes. The connectivity index (Hovadik and Larue 2007) was also calculated to estimate the degree of connection of pixels inside the same geomorphological class. Connectivity values range from zero for totally fragmented units, entirely composed by non-adjacent pixels, to one for totally connected units.