Introduction

Landslide inventories are commonly compiled to investigate the geomorphic evolution of steeplands or to analyze earth surface dynamics (Soeters and Van Westen 1996; Hovius et al. 1997; Muenchow et al. 2012). Spatial information on past slope movements is also used to calibrate and validate empirically based landslide susceptibility models (Brenning 2005; Van Westen et al. 2008; Guzzetti et al. 2012; Petschko et al. 2016). The derived landslide susceptibility maps regularly serve as a basis for decision-making in hazard prevention and spatial planning (Fell et al. 2008; Greiving et al. 2012; Guillard and Zezere 2012; Corominas et al. 2013; Petschko et al. 2014b) because they depict locations where landslides are more or less likely to occur in the future (Brabb 1984; Guzzetti et al. 2006b). Landslide susceptibility maps do not provide information on ‘when’ a landslide will occur or ‘how large’ or ‘intensive’ a future landslide will be (Guzzetti et al. 2006a).

In principle, statistical landslide susceptibility models are generated by relating spatial information on past landslide activities (i.e. landslide presence/absence) to static geoenvironmental factors (e.g. topography, lithology) using statistical or machine-learning techniques. The generated empirical relation, commonly expressed as a relative susceptibility score, is then applied to each spatial unit of an area (e.g. grid cell, slope unit) (Rossi et al. 2010; Reichenbach et al. 2014; Goetz et al. 2015). The validity and generalizability of such spatial predictions are commonly assessed by interpreting inventory-based predictive performance estimates (Brenning 2005; Guzzetti et al. 2006b; Frattini et al. 2010). Since metrics such as the Area under the Receiver Operating Characteristic Curve (AUROC) or the area under the prediction rate curve are calculated for one or multiple independent landslide test samples (i.e. samples not used to train the model), they are frequently considered to summarize the capability of a predictive model to identify landslide-prone areas (Chung and Fabbri 2003; Remondo et al. 2003; Beguería 2006; Guzzetti et al. 2006b). Such inventory-based metrics are also taken into account to evaluate different classification algorithms (Goetz et al. 2015; Steger et al. 2016a) or the utility of specific predictor combinations (Iovine et al. 2014; Conoscenti et al. 2016), the spatial transferability of modelling results (Petschko et al. 2014b; Lombardo et al. 2014), the influence of sample sizes (Petschko et al. 2014b; Heckmann et al. 2014; Hussin et al. 2016), the effect of sampling strategies (Regmi et al. 2014; Conoscenti et al. 2016; Hussin et al. 2016) or the impact of data set qualities (Galli et al. 2008; Fressard et al. 2014).

A number of studies outline that a reliable landslide inventory is a vital component to achieve high-quality statistical landslide susceptibility models, also because most analysis steps are dependent on a correct representation of past landslide occurrences (Guzzetti et al. 2006b; Cascini 2008; Fell et al. 2008; Harp et al. 2011; Petschko et al. 2014b; Steger et al. 2016a, b). Of particular concern are the positional accuracy and completeness of landslide information (Malamud et al. 2004; Chacón et al. 2006; Galli et al. 2008; Guzzetti et al. 2012; Xu et al. 2015; Petschko et al. 2016; Santangelo et al. 2015). The positional accuracy of an inventory is reliant on, e.g. the type and quality of the available mapping basis, time availability and the specific characteristics of landslides and the study site (Ardizzone et al. 2002; Harp et al. 2011; Guzzetti et al. 2012; Petschko et al. 2016; Santangelo et al. 2015). Even though modern technologies such as differential Global Positioning Systems (GPS) and Light Detection and Ranging (LiDAR) may facilitate a positionally precise localization of visible landslide features, complete landslide inventories may still be challenging to achieve. In agricultural land or near transportation infrastructure, for instance, terrain features indicative of landslide activity may more frequently and more quickly be blurred or removed by human activities (e.g. remediation work, planation). This may favour an overrepresentation of landslides within forest areas and an underrepresentation in agricultural land (Bell et al. 2012; Petschko et al. 2016; Conoscenti et al. 2016). In contrast, inventories mapped by visually analyzing multi-temporal aerial photographs may be substantially incomplete within forests because treetops may ‘hide’ a considerable portion of geomorphic features (Brardinoni et al. 2003; Jacobs et al. 2016). Landslide inventories compiled from public reports may overrepresent landslides in closer proximity to infrastructure and underrepresent slope movements in remote and forest areas (Guzzetti et al. 1999; Steger et al. 2016a).

Several studies compared statistical landslide susceptibility models produced from heterogeneous inventories (Ardizzone et al. 2002; Galli et al. 2008; Zêzere et al. 2009; Fressard et al. 2014; Steger et al. 2015, 2016a). However, a differentiated evaluation of the propagation of potential inventory-based errors into landslide susceptibility models was hampered due to the practical inseparability of positional accuracy and inventory completeness as well as the lack of truly accurate reference inventories. While a previous study examined the effects of positional accuracy in detail (cf. Steger et al. 2016b), the present follow-up study devotes particular attention to the second quality-defining criterion, namely the completeness of a landslide inventory.

This research focuses on the impact of a spatially heterogeneous completeness of landslide information on statistical landslide susceptibility models by artificially introducing two different mapping biases into (i) an available landslide inventory and (ii) synthetically generated data sets. The main objective was to examine the influence of systematically incomplete landslide inventories on modelled relationships and validation results. An additional goal was to propose suitable modelling strategies that can mitigate the effects of incompleteness. In this context, we built upon earlier findings (Steger et al. 2016a) and explored for the first time the potential of mixed-effects models to tackle the problem of confounded relationships as a result of inventory-based biases.

Study area

The study area (20 km × 5 km) belongs to the districts of Amstetten and Waidhofen/Ybbs in the western part of the federal state of Lower Austria (Fig. 1d). The prevalent undulating landscape of the Flysch Zone (81 km2; mean slope, 12.3°), with its intensively weathered alternating sediment sequences, is highly susceptible to landslides of the slide-type movement. The less steep northern portion of the study area is partly covered by clastic sediments of the Molasse Zone (3 km2; mean slope, 5.1°) while the valley floors are mainly covered by quaternary sediments (16 km2; mean slope, 5°) (Fig. 1a).

Fig. 1
figure 1

Location (d) and overview of the study area. Spatial distribution of slope angles (a), lithology (a), slope orientation (b), land cover units (c) and mean annual precipitation (c). The (unmodified) landslide inventory (n = 591) is given in c. The shaded relief image (excerpt in blue) shows the geomorphic footprint of characteristic shallow landslides of the area. Corresponding squares relate to the modelling resolution of 10 m × 10 m and depict the landslide scarp mapping location

The hilly parts located in higher elevations are intensively used for cattle farming (pastures, 50 km2) while arable land (20 km2) is dominant in the lowlands. Forest areas (26 km2) are predominantly located in the hilly parts of the Flysch Zone (Fig. 1c). Settlements and main roads are primarily located within the valley floors while single farms, and smaller roads are also scattered over the steeper hilly areas. In total, built-up areas account for 4 km2 (Eder et al. 2011).

The study site experiences oceanic climate influences from the West and continental influences from the East. Mean annual precipitation amounts generally increase with elevation and range from 900 mm in the North (elevation, 350 m a.s.l.) to 1100 mm in the South (790 m a.s.l.) (Fig. 1c) (Skoda and Lorenz 2007). The prevalent shallow and small landslides of the area are usually triggered by rainfall and/or snow-melting events. In particular, locally concentrated convective precipitation events in the summer or long-lasting rainfall events between autumn and spring are known to promote shallow landslides of the slide-type movement according to the classification of Cruden and Varnes (1996) and Dikau et al. (1996). Critical landslide-triggering conditions are regularly achieved whenever intensive snow melting coincides with severe precipitation (Schwenk 1992).

The study area is described in more detail in Steger et al. (2016b) while additional information on the prevalent landslide processes and the litho-morphological characteristics of the area and its surroundings can be found in Schwenk (1992), Wessely et al. (2006) and Petschko et al. (2016).

‘Real data’ and ‘synthetic data’

Landslide data (Fig. 1c), topography (i.e. slope, northness, eastness) and lithological information (Fig. 1a) as well as the synthetically generated basic data sets (Fig. 2) were adopted from previous analyses (cf. Steger et al. 2016b). Within this study, the expression real data relates to real world data from Lower Austria while the expression synthetic data refers to artificially generated data sets.

Fig. 2
figure 2

Spatial representation of the synthetic data sets. Spatial distribution of slope angles (a), lithology (a), slope orientation (b), land cover units (c) and mean annual precipitation (d). The simulated distribution of landslide locations (n = 2000) is given in d. The municipalities depicted in d are used to simulate an inventory-based mapping bias related to administrative units. Conditional frequencies in e show that municipalities affected by a simulated municipality-related bias exhibit lower precipitation rates. Areas affected by the simulated forest-related inventory bias (forests) are predominantly located on steeper slopes (f)

The available ‘real’ point-based landslide inventory (n = 591) of the study area represents landslide scarps of mainly smaller and shallow landslides of the slide-type movement (cf. excerpt of Fig. 1c) and was mapped by Petschko et al. (2016) for statistical landslide susceptibility modelling, mainly on the basis of high resolution shaded relief images of an airborne laser-scanning digital terrain model (ALS-DTM). Comprehensive information on this landslide data set and its mapping procedure can be found in Petschko et al. (2016).

A number of landslide susceptibility studies advocate to represent each landslide with a single point in order to reduce spatial autocorrelation of observations and to avoid weighting for landslide size (e.g. Van Den Eeckhaut et al. 2006; Atkinson and Massari 2011; Petschko et al. 2014b; Steger et al. 2016b). Several field trips not only verified the high positional accuracy of the present inventory but also emphasized that this data set is likely to be biased. This relates in particular to the observation that the extent of anthropogenic interventions (e.g. planation, remediation works) on geomorphic features may vary considerably between different land cover types (Bell et al. 2012). Thus, the inventory is expected to miss an unknown number of landslides located on pastureland or arable land and in close proximity to infrastructure. In contrast, landslides are believed to be overrepresented under forests as a consequence of well-preserved and therefore visually easily detectable geomorphic features (Bell et al. 2012; Petschko et al. 2014a, 2016).

Topographic predictors (Fig. 1a, b; real data) were based on derivatives of a resampled (10 m × 10 m) ALS-DTM whereas lithological information was extracted from a digital geological map (GK200, scale 1:200,000). The reference landslide susceptibility model for the real data was generated using the predictor set ‘B’, which consists of four variables, namely slope, northness, eastness and lithology. In the following, this set was further enhanced to include the predictors land cover and mean annual precipitation rates to evaluate the effect of including or excluding predictors that are spatially related to a specific inventory-based incompleteness (Fig. 1c; cf. “Methods” section). Land cover was extracted from classifications conducted by Eder et al. (2011) while mean annual precipitation rates were derived from modelling results of Skoda and Lorenz (2007). A municipality layer was used within the applied mixed-effect models to account for a simulated municipality-related inventory bias (cf. “Generalized linear models and generalized linear mixed models” section). Corresponding boundaries were gained from a digital topographic map (ÖK50, scale 1:50,000).

The present synthetic data (Fig. 2) corresponds to data sets described in detail in Steger et al. (2016b). This artificially generated data has already proven valuable to profoundly elaborate the effect of inventory-based positional errors on statistical modelling results as it allowed to define a ‘true’ and unbiased relation between an assumed perfectly accurate and complete inventory (n = 2000; Fig. 2d) and those five environmental factors that were defined to determine landslide susceptibility. Within this study, synthetic data was specifically utilized to further verify the results obtained by the real data set under controllable conditions.

The predictor set ‘B + L’ consists of those predictors (i.e. slope, northness, eastness, lithology and land cover) and thus relates to the respective reference landslide susceptibility model of the synthetic data set. Slope angles (Fig. 2a) and both aspect layers (Fig. 2b) were derived from a strongly generalized ALS-DTM while the three equally sized lithological units (Fig. 2a) were spread over the West (unit A), the East (unit B) and across the entire area (unit C). The distribution of the three land cover units was specified to be conditioned on slope angle (Fig. 2c, f).

Within this study, we additionally defined a synthetic mean annual precipitation layer (Fig. 2d) which spatially relates to a simulated municipality-related incompleteness of the inventory (Fig. 2d, e). Corresponding 16 square-shaped municipality boundaries exhibit a spatial extent of 6.25 km2 each (Fig. 2d).

Methods

The methodological framework of this study consisted of an artificial introduction of two different types of mapping biases (i.e. forest bias; municipality bias) into the available real landslide inventory and a synthetically generated landslide data set. Logistic regression was applied to model landslide susceptibility separately with each of those differently biased inventories by separately using two different predictor sets (Fig. 3). The first set consisted of predictors which were not specifically related to a simulated inventory incompleteness while the second set contained one environmental factor spatially related to a simulated bias (i.e. land cover for the forest-related bias; precipitation for the municipality-related bias). Furthermore, we tested whether the application of mixed-effects logistic regression models enabled less confounded predictions when modelling with systematically incomplete inventories (cf. “Generalized linear models and generalized linear mixed models” section). Finally, all models were thoroughly evaluated (cf. “Model evaluation” section). All statistical analyses were performed using the statistical software R and its packages ‘stats’, ‘lme4’ and ‘sperrorest’ (Brenning 2012a; Bates et al. 2014; R Core Team 2014).

Fig. 3
figure 3

Methodological framework of this study. After an artificial introduction of two different mapping biases into a landslide inventory (dark grey boxes), landslide susceptibility models were generated separately by considering different classification techniques (red texts) and predictor combinations (blue texts). The subsequent 34 models were evaluated with multiple techniques (model evaluation). Note that abbreviations (e.g. ‘GLM: B + L’) are further used within upcoming figures

Artificial introduction of inventory-based biases

The first inventory mimics a systematic forest-related bias which may arise when landslides are mapped by solely interpreting aerial photographs (Brardinoni et al. 2003) or when landslides in forest areas are underreported (Steger et al. 2016a). A forest-related incompleteness was artificially introduced in this study by randomly deleting 20 and 80% of landslides within areas classified as forests (Fig. 4a).

Fig. 4
figure 4

Unmodified landslide inventory (red dots in a and c) and simulated inventory-based incompleteness within forested areas (black and yellow dots in a) and within specific municipalities (black and yellow dots in c) for the real data set. The conditional frequency plots (b, d) show observed spatial interrelations between variables affected by a bias and environmental factors: Forests are more likely located on steeper slopes (b) and municipalities affected by a bias (grey area) generally show lower precipitation rates (d)

A spatially varying bias may also arise from merging different landslide data sets from adjacent areas (Van den Eeckhaut et al. 2012) especially if different mapping procedures were used, if data sets were compiled for different purposes or by individuals with different levels of expertise, or if underreporting varies between jurisdictions (Ardizzone et al. 2002). In order to mimic this possible source of bias, a second bias was introduced in this study by gradually removing 20 and 80% of landslides within specific municipalities to simulate a bias related to the administrative units (Fig. 4c).

Generalized linear models and generalized linear mixed models

Binary logistic regression is commonly applied to predict landslide susceptibility at a regional scale. This classifier is based on a generalized linear model (GLM) with a logistic link function that enables modelling the (fixed) effect (cf. fixed part of Eq. (1)) of each predictor or predictor class on the response (landslide presence/absence) (Atkinson et al. 1998; Brenning 2005; Van Den Eeckhaut et al. 2010; Regmi et al. 2014; Felicísimo et al. 2013; Budimir et al. 2015). Binary logistic regression is further referred to as GLM.

A confounder is a variable which is associated with both, the response variable (i.e. landslide inventory) and another variable (e.g. slope) (Brenning 2012b; Szklo and Nieto 2014). In the context of systematically incomplete inventories, confounding may become particularly problematic whenever an inventory-based incompleteness is directly related to a specific variable (e.g. land cover class, municipality) that acts as a confounder for other predictors (e.g. slope, lithological units). For instance, in the likely case that forested areas are more frequently located on steeper terrain (Rickli et al. 2002; Steger et al. 2016a; cf. Fig. 4b), a forest-related incompleteness of the inventory may lead to the tendency that landslides are underrepresented not only in wooded areas but also on steeper slopes. Thus, a potential exclusion of the bias-describing predictor land cover may lead to systematically confounded modelling results (e.g. in the form of a biased slope coefficient) (Brenning 2012b; Steger et al. 2016a). However, earlier findings also provided evidence that an inclusion of such a bias-describing predictor may as well be related to misleading modelling results because of a subsequent direct bias propagation via the included variable (Steger et al. 2016a).

We aimed to tackle the problems of direct bias propagation and confounded relationships by proposing a novel statistical approach (i.e. mixed-effects models) to improve landslide susceptibility models generated with systematically incomplete inventories. From our knowledge, all statistical landslide susceptibility models generated up to now belong to the group of fixed-effects models, where the main interest lies in the specific influence of each predictor and predictor level on the response (Bolker et al. 2009). Mixed-effects models additionally allow us to include random effects and have proven to be useful to analyze nested (e.g. hierarchically grouped) or correlated (spatial or temporal) data in the fields of medicine, economy, social science and ecology (Zuur et al. 2009). An inclusion of random terms may also be valuable when a specific categorically scaled predictor is considered to be a ‘nuisance’ parameter which is not of direct interest, but should be accounted for when estimating the fixed-effects coefficients (Bolker et al. 2009). Within this study, mixed-effects logistic regression models were applied to estimate a binary outcome (i.e. landslide presence/absence). More specifically, we fitted a generalized linear mixed model (GLMM) while specifying a bias-describing variable as random intercept and the other predictors as fixed effects:

$$ \mathrm{logit}\left( P\left({Y}_{ij}=1\right)\right)=\underset{fixed\ effects}{\underbrace{\beta_0+{\beta}_1{X}_{1 i}+\dots +{\beta}_p{X}_{p i}}}+\underset{random\ intercept}{\underbrace{\gamma_j}} $$
(1)

where β 0 relates to the intercept, β 1β p to the regression coefficients of associated predictors X 1X p and γ to the random intercept. γ is assumed to have a prior distribution which is normally distributed with mean zero and variance σ 2 (Bolker et al. 2009; Zuur et al. 2009; Bates et al. 2014). The idea behind this procedure was to separate (i.e. isolate) the variation which relates to inventory-based bias from the effects which are assumed to be less influenced by those biases. Thus, the random intercept term was specifically included to represent (i.e. account for) a bias which is directly related to a categorically scaled variable when estimating the coefficients of the fixed effects predictor variables. The variable land cover was included as a random intercept to represent variations originating from a forest-related bias while a municipality layer was introduced to account for the bias directly related to the respective administrative boundaries. Finally, only the fixed effects were used to predict landslide susceptibility. Thus, all presented GLMM predictions as well as related predictive performance estimates were based on a random intercept of 0 (i.e. fixed effects alone), averaging out the random effects.

Model evaluation

The odds ratio (OR) represents a measure of association and is regularly used to compare modelled relations within logistic regression models (Hosmer and Lemeshow 2000). The modelled relationships between the response variable and the predictors were evaluated by estimating ORs for ‘meaningful’ increments of each predictor (Brenning 2012b; Brenning et al. 2015; Steger et al. 2016b). For instance, ORs estimated for the predictor slope displayed differences in the chance that a 10° steeper slope is affected by future landsliding (e.g. 1 = equal chances, >1 steeper slopes are more likely affected). This accounts for potential confounding effects of other predictors included in the model. In analogy, ORs obtained for single land cover classes depict how much more likely (or unlikely) a specific land cover type (e.g. forest) is affected by landsliding compared to a reference class (e.g. pastures). This consideration already accounts for the effect that forests are more frequently located on steeper slopes (Steger et al. 2016a).

Comparisons of modelled relationships and landslide susceptibility maps with their references (i.e. models assumed to be less affected by a bias) provided indications on how the respective inventory-based errors were reflected by the final results. A substantial deviation of ORs and predicted susceptibility patterns would be interpreted as evidence that the inventory mapping error was propagated into the final modelling results. Quantile classification of the final maps was conducted to ease a visual comparison of susceptibility patterns (Hussin et al. 2016).

The predictive performance of all models was assessed by estimating the AUROC (0.5 = random model; 1 = perfect discrimination between landslides and non-landslides) by means of a repeated non-spatial (cross-validation; CV) and spatial (spatial cross-validation; SCV) partitioning of training and test samples (Brenning 2012a; Petschko et al. 2014b; Goetz et al. 2015). CV and SCV was based on a 50-repeated 10-fold validation for each model.

The transferability index, which is also based on a repeated estimation of predictive performances, was adopted from Petschko et al. (2014b) and provided information on the non-spatial (CV) and spatial (SCV) transferability of the modelling results. This metric mainly reflects the interquartile range of computed AUROCs while additionally accounting for a variability of sample sizes in the respective test data set. A low transferability index points out that the obtained predictive performances of a model were relatively similar for different partitions. Thus, obtained low index values indicate robust modelled relationships and a subsequent high non-spatial (CV) and spatial (SCV) transferability of modelling results (Petschko et al. 2014b).

The predicted susceptibility score was further compared with the unmodified response variable using the AUROC. This measure provided quantitative information on how well the models were able to predict landslide observations related to less biased (real data) and unbiased (synthetic data) response variables. Note that this measure corresponded to the goodness of model fit whenever the respective data sets related to the unmodified inventories (0% of simulated incompleteness).

Results

The impact of the land cover-related inventory bias

ORs obtained for the GLMs generated without the land cover variable, but with highly biased inventories, provided evidence of confounded relationships between landslide inventories and those predictors that were correlated with these bias-describing variables (e.g. forests are more likely located on steeper slopes; Fig. 4b). For instance, a substantial (80%) forest-related underrepresentation of landslides led to a weaker modelled dependence of landslide occurrence on slope angle (‘GLM: B’ in Fig. 5a, e). ORs (reported with 95% confidence intervals in square brackets) of the study area’s reference model (‘R’ in Fig. 5a) showed that the chances of a 10° steeper slope to be affected by landsliding were 7.3 [5.5/9.8] times higher compared to their 10° flatter counterparts while the respective model generated with the highly biased inventory (‘GLM: B’, 80% in Fig. 5a) exposed considerably lower ORs 5 [3.7/7]. ORs obtained from the synthetic data sets exposed that the modelled associations between slope angles and landslide occurrence differed between the reference model (‘R’ in Fig. 5e) and the models generated without land cover (‘GLM: B’). This tendency was further intensified with an increasing portion of inventory-based incompleteness on forested areas (‘GLM: B’ in Fig. 5e). The respective ORs of the predictor slope showed a decrease from 5.4 [4.8/6.1] to 4 [3.5/4.5].

Fig. 5
figure 5

Modelled relationships expressed as odds ratios (OR) for models generated with differently incomplete inventories on forested areas (0, 20, 80%) for the real data set (a, b, c, d) and the synthetic data (e, f, g, h). GLMs generated with a highly biased inventory (80%) and without land cover as a predictor (‘GLM: B’) provide evidence of confounded slope coefficients (a, e). GLMs generated with land cover as a predictors (‘GLM: B + L’) indicate a direct bias propagation into the final model via the predictor land cover (d, h). GLMMs avoided this bias and simultaneously reduced confounding effects

ORs of models generated with land cover as a predictor (‘GLM: B + L’) and differently biased inventories (0, 20, 80%) provided quantitative evidence that land cover partly accounted for a variability originating from the simulated forest-related bias, when estimating the fixed-effects coefficients of the other predictors. Thus, ORs estimated for the predictor slope were more similar across all simulated biases (cf. dashed line with black dots in Fig. 5a, e). For example, GLMs fitted with the highly biased inventory (80%) showed an OR deviation of the predictor slope from the respective reference models of 0.9 (real data) and 0.5 (synthetic data) whenever the model was generated with land cover as a predictor and 2.3 (real data) and 1.4 (synthetic data) in the case the model was produced without land cover. However, modelled relationships obtained for the predictor land cover (Fig. 5d, h) provided further quantitative evidence that a simulated inventory-based bias may be directly propagated into the final models when a specific predictor (in this case land cover) relates to a systematic incompleteness of the inventory (Steger et al. 2016a). Consequently, GLMs predicted constantly decreasing and finally very low chances of forests to be affected by future slope movements (i.e. OR drop to 0.1 in Fig. 5d, h), ultimately due to the high number of missing landslides in forested areas. The resulting landslide susceptibility maps (Fig. 6f, o) directly reflected this bias at forest locations (Fig. 6s, t) by showing considerably lower susceptibility values in comparison to their references (Fig. 6a, m).

Fig. 6
figure 6

Excerpts of landslide susceptibility maps generated with differently biased inventories on forested areas (0, 20, 80%) by applying different classifier-predictor combinations for the real data set (a, b, c, d, e, f, g, h, i) and the synthetic data (j, k, l, m, n, o, p, q, r). The maps marked as ‘Reference’ are considered to be only slightly influenced (a) or unaffected (m) by an inventory-based bias. All other maps are interpreted relatively to these reference maps. Note the substantial differing appearance of susceptibilities at forested areas (s, t) for all maps generated with a highly biased inventory and land cover as a predictor (f, o) and similarly appearing GLMM-based maps

In general, ORs obtained for the GLMMs provided evidence that these models accounted for inventory bias while additionally ensuring that the bias was not directly propagated into the final models via the land cover predictor. The influence of forest-related variation was accounted for by the random intercept for land cover. Thus, GLMMs and GLMs generated with land cover as a predictor (‘GLM: B + L’) showed similar and more stable modelled associations, especially for the predictor slope, than GLMs without land cover (Fig. 5). The previously mentioned direct bias propagation via the predictor land cover was successfully avoided, because the respective predictions were based on the fixed effects alone. Therefore, whenever the respective susceptibility maps were generated with the highly biased inventory (‘80F’ in Fig. 6), spatial patterns of GLMM-based maps (Fig. 6i, r) appeared relatively similar to the reference maps (Fig. 6a, m).

Validation results continuously showed AUROC values of >0.8 (Fig. 7). A comparison of predictive performances provided evidence that GLMs generated without land cover (Fig. 7a, d) and GLMMs (Fig. 7c, f) performed worst when the respective models were generated with a strongly biased inventory (80%). A contrasting trend and the highest predictive performances were observed for the models that were previously identified as being highly biased (‘80%’ in Fig. 7b, e). However, further comparisons exposed their poorest performance in predicting the original landslide locations (grey line in Fig. 7). This discrepancy was interpreted as quantitative evidence for overoptimistic predictive performance estimates for GLMs generated with land cover as a predictor (‘GLM: B + L’).

Fig. 7
figure 7

Validation results and transferability indexes obtained for models generated with differently complete inventories in forested areas for the real data set (a, b, c) and the synthetic data (d, e, f). Boxplots refer to predictive performances obtained by cross-validation (CV) and spatial cross-validation (SCV). The grey line shows a comparison of model predictions (AUROC) with the data set that relates to the unmodified inventories (= model fit for all 0%-models). Diamonds relate to the second y-axis and show non-spatial (CV) and spatial (SCV) transferabilities of the modelling results

Generally, higher transferability indexes were obtained for the models generated for the real data set. The observed low variation in predictive performances (i.e. low transferability index in Fig. 7) obtained for the models generated with synthetic data indicated a high non-spatial (CV) and spatial (SCV) transferability of the modelling results. Lowest spatial transferabilities (highest index values for SCV in Fig. 7) were assigned to the real data models based on highly incomplete inventories.

The impact of the municipality-related inventory bias

The main insights obtained by simulating a municipality-related inventory bias were, with exceptions, similar to the trends detected by mimicking a forest-related incompleteness. GLMs based on the most incomplete inventory (80%) and without a predictor that was spatially related to a simulated bias (in this case precipitation) generally showed the most distinct deviations of modelled relationships from the reference models (cf. compare the crosses with ‘R’ in Fig. 8). This in turn provided evidence for the presence of confounded relationships. The resulting biases can also be traced back by visually comparing the respective landslide susceptibility maps (Fig. 9c, l) with their references (Fig. 9a, j).

Fig. 8
figure 8

Modelled relationships expressed as odds ratios (OR) for models generated with differently complete inventories (0, 20, 80%) in specific municipalities (cf. Figs. 2d and 4c) for the real data set (a, b, c, d) and the synthetic data (e, f, g, h, i). GLMs generated with a highly biased inventory (80%) and without precipitation (‘P’) as a predictor indicated high deviations in modelled relationships from the reference models (‘R’). ORs of the predictor precipitation (d, i) provide evidence that an actually non-existing relation (OR near 1 at 0%) was turned into an apparent positive association. GLMMs avoided this bias while simultaneously reducing confounding effects

Fig. 9
figure 9

Excerpts of landslide susceptibility maps generated with differently biased inventories (0, 20, 80%) on specific municipalities for an area with high precipitation rates (real data, a, b, c, d, e, f, g, h, i) and low precipitation rates (synthetic data, j, k, l, m, n, o, p, q, r) by applying different classifier-predictor combinations. The maps marked as ‘Reference’ are considered to be only slightly influenced (a) or unaffected (j) by an inventory-based bias. All other maps are interpreted relatively to these maps. Note the differences between the References and the respective maps generated with a highly biased inventory and precipitation (f, o). GLMM-based maps (i, r) appeared most similar to the reference maps (a, j) in the case only highly biased inventories were available (‘80MU’)

Generally smaller distortions (i.e. changes in ORs) were observed for the predictor slope in the case precipitation, which is spatially related to the bias (Figs. 2e and 4d), was included as a predictor (cf. compare black dots with ‘R’ in Fig. 8a, e). However, a direct bias propagation within those models was exposed when interpreting ORs for a precipitation increase of 50 mm (Fig. 8d, i). For instance, the respective ORs obtained for models generated with the unmodified inventory provided evidence of a non-existing and slightly negative association between the respective landslides and the mean annual precipitation rates (Fig. 8d, 0.92; Fig. 8i, 0.97). However, an apparent strong and positive relationship emerged in models fitted to an 80% incomplete inventory (Fig. 8d, 1.86; Fig. 8i, 1.47). This spurious relationship was well visible in the resulting susceptibility maps (compare Fig. 9f, o to Fig. 9a, j).

GLMMs were again able to account for some variation related to a simulated inventory-based incompleteness while additionally avoiding a direct bias propagation (i.e. since a bias-describing predictor was not used to predict susceptibility). Therefore, susceptibility maps obtained from GLMMs with the highly biased inventory (Fig. 9i, r) appeared most similar to their references (Fig. 9a, j).

However, none of the models were able to avoid a substantial decrease in estimated OR within the lithological unit Molasse (Fig. 8b). This relatively small area (3% of total area) was observed to be entirely covered by the municipalities affected by a bias (compare Fig. 1a with Fig. 4c) while additionally being represented by a very small sample size (n = 21). The OR decrease was accompanied by a decrease in the ratio between landslide presences and absences. We observed a considerable decrease in this ratio from an initial 17% (3 landslides) to 6% (1 landslide) and 0% (no landslide) with increasing incompleteness of the inventory. Consequently, all landslide susceptibility maps generated with the highly biased inventory erroneously indicated that the Molasse Zone is stable, regardless of the other environmental conditions.

Model validation (Fig. 10) consistently produced AUROC values higher than 0.8, which would normally be considered to reflect a good (Fressard et al. 2014) or excellent (Conoscenti et al. 2016) predictive performance of a landslide susceptibility model. Comparing these estimates, we observed that the apparent predictive performance obtained by CV increased with an increasing bias of the inventory. Models that included a bias-describing predictor exposed highest CV-based AUROCs (80% in Fig. 10b, e), but the lowest ability to predict the original landslide positions (grey line in Fig. 10b, e). In this regard, GLMMs generated with the highly biased inventory performed slightly better, compared to the other models generated with an evenly biased data set.

Fig. 10
figure 10

Validation results and transferability indexes for models generated with differently biased inventories (0, 20, 80%) on specific municipalities for the real data set (a, b, c) and the synthetic data (d, e, f). Boxplots refer to predictive performances obtained by cross-validation (CV) and spatial cross-validation (SCV). The transferability index (second y-axis) exposed a lower internal spatial transferability of modelling results (i.e. high SCV based index) as a result of the simulated spatial inventory-based bias. The grey line depicts the comparison of model predictions with the original landslide position (= model fit for all 0%-models). In the case only a highly biased inventory was available (cf. all 80%-models), GLMMs (c, f) performed best to predict the unmodified inventory while the apparently best performing GLMs generated with precipitation (cf. CV-based predictive performance in b and e) performed worst

SCV revealed remarkably higher variations of AUROC values in the case the underlying models were generated with the highly biased inventory (80%). The lower spatial transferability of those models is reflected by high transferability indexes and can visually be examined by comparing the respective boxplot sizes (Fig. 10). Further indications of spatially inconsistent modelling results were obtained by comparing CV-based AUROCs with SCV-based AUROCs of identical models. Predictive performances of the synthetic models, which in other cases were similar (compare CV and SCV in Fig. 7d–f), discernibly dropped when spatially estimated for the highly biased data set (‘80%’ in Fig. 10d–f). AUROCs of identical models generated with real data were also remarkably lower when assessed in a spatial context (Fig. 10a–c, but also Fig. 7a–c).

Discussion

Our study confirmed and provided further quantitative evidence of the critical importance of landslide inventory completeness for the quality and validity of statistical landslide susceptibility assessments (Ardizzone et al. 2002; Galli et al. 2008; Harp et al. 2011; Fressard et al. 2014; Steger et al. 2016a, b). However, our results also emphasized that linkages between the completeness of a landslide inventory and modelled relationships, validation results and the spatial appearance of the final maps are multi-faceted.

In particular, the inventory’s degree of completeness is only one of the several aspects that determine how and to what extent an inventory bias may propagate into the results. Apart from the number of the respective observations within specific predictor classes (cf. “Minor inventory biases locally affect modelling results” section) and the extent of spatial agreement between predictors and areas affected by an inventory bias (cf. “The influence of bias-describing predictors” section), the selected modelling approaches (cf. “Confounding factors and the usefulness of mixed-effects models” section) were also observed to control the influence of systematically incomplete inventories on landslide susceptibility models. Whether or not a modeller is able to detect such discrepancies may in turn depend on the choice of the cross-validation technique used to evaluate predictions (cf. “The influence of bias-describing predictors” and “The need for differentiated model evaluations” section). Based on our findings, we finally propose a four-step procedure to deal with systematically incomplete inventories in the context of statistical landslide susceptibility modelling (cf. “Practical recommendations” section).

Minor inventory biases locally affect modelling results

It was remarkable that landslide susceptibility models generated with a 20% systematically incomplete inventory did not substantially differ from the respective reference models. We observed similar modelled relationships, comparable predictive performances and, with local exceptions, visually similar maps, especially in situations where a bias-describing predictor was not included. Due to model generalization and model uncertainty, systematic incompleteness of this order of magnitude therefore appeared to be too small to be detectable statistically in landslide susceptibility modelling. Since strongly generalizing classifiers are expected to offer some degree of protection against a direct inventory-based error propagation, we advise to avoid classification techniques that are highly flexible (e.g. machine learning) in the case only incomplete or inaccurate inventories are available (Steger et al. 2016b).

However, the observed substantial OR changes of the relatively small Molasse Zone (Fig. 8b) indicated that a minor inventory incompleteness may locally strongly influence spatial predictions, particularly when a class of a categorically scaled predictor exhibits a small number of observations. The observed high level of uncertainty around the respective coefficient of the reference model (95% OR-confidence, 0.2 to 2.6) additionally indicated its high sensitivity to minor changes in the sample. From this observation, we infer that an inclusion of more detailed thematic information (e.g. lithology, land cover, soil types) does not necessarily lead to improved landslide susceptibility assessments, since a small number of observations within specific classes may result in very uncertain estimates (Heckmann et al. 2014) and a consequent high sensitivity to minor inventory-based biases. When validating the models, it should always be noted that overall, strongly summarizing measures of diagnostic accuracy, like the AUROC (Swets 1988), are not designed and able to detect such local distortions.

In statistical landslide susceptibility modelling, the problem of small sample sizes is regularly counteracted by artificially increasing the number of observations (Poli and Sterlacchini 2007; Fressard et al. 2014). However, sampling multiple points per landslide observation may often not be suitable due to an increasing spatial autocorrelation of cases (Van Den Eeckhaut et al. 2006; Atkinson and Massari 2011), a subsequent overoptimistic and misleading confidence of model parameters (i.e. confidence of coefficients) and a potentially undesired weighting for size (i.e. not providing equal treatment of small and large landslides). Furthermore, feasibility might become an issue in the case that computational demanding state-of-the-art algorithms (e.g. CV-based model parameterization of machine learning techniques, k-fold spatial cross-validation, permutation-based variable importance assessment) are applied (Brenning 2012a).

The influence of bias-describing predictors

The results provided quantitative evidence to the suspicion that distorted relationships and misleading predictive performances may follow whenever a specific predictor systematically relates to a substantial bias of an inventory (Steger et al. 2016a). Specifically, substantially incomplete inventories in forest areas led to spurious modelled relationships (decreasing ORs of forests in Fig. 5d, h) and susceptibility maps (low susceptibility in forests in Fig. 6f, o) as soon as land cover was introduced as a predictor.

The findings additionally indicated that a spatially strongly varying completeness of an inventory may, coincidentally, be spatially related to certain environmental conditions (e.g. municipalities affected by a bias experience lower precipitation rates; cf. Fig. 4d). An inclusion of such a bias-describing predictor led to a direct bias propagation into the modelling results, which was reflected by the trend that an initially geomorphically implausible negative association between landslide occurrence and precipitation developed into an apparent distinct and influential positive modelled relationship (ORs in Fig. 8d, i). Ultimately, this led to exaggerated landslide susceptibilities at locations with high precipitation (e.g. Fig. 9f) and an underestimation of landslide susceptibility in areas represented by relatively low precipitation (e.g. Fig. 9o). Thus, this study further underlines difficulties which may arise when determining landslide driving factors with statistical models (Vorpahl et al. 2012; Brenning et al. 2015) by providing quantitative indications that inventory errors may influence the weighting of predictors and consequently also the appearance of the final landslide susceptibility maps (Ardizzone et al. 2002; Fressard et al. 2014). Thus, we agree that landslide susceptibility models, particularly generated with biased input data, may not be useful to derive a causal association between environmental conditions and landslide occurrence (Donati and Turrini 2002; Felicísimo et al. 2013), especially since we detected that these biased models might perform better from a purely quantitative perspective.

It was remarkable that an inclusion of previously discussed spurious relations (via a bias-describing predictor) led to increased predictive performances (CV in Fig. 7b, e and Fig. 10b, e). The observation that highest AUROCs may as well be obtained for the geomorphically most implausible maps is in line with previous results (Steger et al. 2016a). We suggest that distorted performance estimates for highly biased models can be expected when firstly, the respective training and test sets are similarly affected by a bias (e.g. both inventory subsets are incomplete in forests). This tendency may be common when applying conventional partitioning techniques (i.e. holdout validation, CV) due to the systematic nature of many biases. Secondly, the respective inventory incompleteness favours an enhanced ability of the bias-describing predictor to distinguish landslide observations from non-landslide observations.

A quantitative comparison of model predictions with the location of the less biased (real data) and unbiased (synthetic data) inventories provided further evidence that the obtained prediction performances can indeed be referred to as overoptimistic, in particular whenever a bias-describing predictor is included into a model. This is another reason why we believe that modellers should not aim to solely improve statistical performance measures like the AUROC by iteratively opting for modifications that increase the respective prediction skills. We finally argue that a process-related interpretation of plausible appearing modelled relationships may be misleading due to possible confounding with inventory errors (Malamud et al. 2004; Guzzetti et al. 2012).

Confounding factors and the usefulness of mixed-effects models

The topography of an area is known to be related to its lithology (Huggett 2007) while topographic variables (e.g. slope, exposition) co-determine land use and thus land cover (Rickli et al. 2002). This example illustrates that environmental factors, which control landslide occurrence, are inevitably interrelated, and thus, some sort of confounding may be regularly present within many statistical landslide susceptibility models (Brenning 2012b). Confounding is the reason why we believe that simply ignoring (i.e. excluding) bias-describing predictors may not be straightforward. We showed that a model fitted on highly biased data, but without a predictor accounting for the variability originating from such biases, adjusted modelled relationships of predictors that correlated with this incompleteness. For example, a substantial forest-related incompleteness of the inventory led to a decreased sensitivity of the models to slope angle (Fig. 5a) since forests were observed (real data) and simulated (synthetic data) to be more likely located on steeper slopes. In analogy, a considerable municipality-related bias led to decreased modelled susceptibilities within those lithological units (i.e. Molasse and unit B) located inside municipalities affected by an incompleteness of an inventory.

The models generated with a bias-describing predictor (i.e. land cover for the forest-related bias, precipitation for the municipality-related bias) showed the ability to reduce the effects of such confounding, but simultaneously produced highly distorted predictions. The main disadvantages of both previously mentioned procedures were avoided by applying GLMMs with a bias-describing variable introduced as a random intercept. GLMMs proved useful to separate some effects (i.e. variability) related to the bias from the effects assumed to be primarily related to landslide susceptibility. The resulting predictions were less confounded (i.e. similar to models generated with a bias-describing predictor) and not directly affected by the inventory-based incompleteness (i.e. similar to models generated without a bias-describing predictor). Thus, the final susceptibility maps generated with a highly biased inventory appeared, with local exceptions (cf. “Minor inventory biases locally affect modelling results” section), remarkably similar to their references (e.g. compare Fig. 9a, j with 9i, r).

We argue that whenever there is a suspicion of an inventory bias (e.g. Brardinoni et al. 2003; Van Den Eeckhaut et al. 2012; Petschko et al. 2016) and this incompleteness can systematically be described by a categorically scaled variable (e.g. land cover, administrative units), mixed-effects models (Bolker et al. 2009; Zuur et al. 2009) may be an appropriate choice. Subsequent predictions should be based on the models’ fixed-effects part alone. However, the separation of bias-related effects from landslide susceptibility-related effects may fail in the case of very high spatial agreements between a random intercept and other predictors (cf. example from the Molasse in “Minor inventory biases locally affect modelling results” section).

In the context of statistical landslide susceptibility modelling, mixed-effects models might additionally bear an up to now unexplored ability to tackle the problem of large and heterogeneous study areas (Petschko et al. 2014b). For instance, an inclusion of random regression coefficients would allow accounting for the effect that the relationship between a predictor (e.g. slope) and landslide occurrence differs between spatial units (e.g. lithologies). The application of generalized additive mixed models (GAMMs) may furthermore prove useful to additionally consider moderately non-linear associations (Zuur et al. 2009; Goetz et al. 2011).

The need for differentiated model evaluations

This study highlighted that an apparent high predictive performance of a model does not constitute proof that a realistic and geomorphically interpretable statistical landslide susceptibility model was generated. Under no circumstances, not even if compared to an unmodified inventory, did any model provide evidence of a poor prediction skill. AUROCs were constantly greater than 0.8. In this respect, we want to point out that it is known that this performance estimates are highly dependent on the study design and may considerably change when including easily classifiable areas (e.g. floodplains) (Lobo et al. 2008; Brenning 2012b). Thus, they do not have an absolute meaning and are not comparable between different study areas. This is also why we think that frequently cited general guidelines for AUROC interpretations (e.g. in Hosmer and Lemeshow 2000) should not be used as a universal yardstick. Some presented models and maps of this study (e.g. Fig. 6f or Fig. 9f) are of little practical use and do clearly not provide an excellent representation of landslide susceptibility despite their AUROCs >0.85. In this context, we agree that an assessment of the prediction skill of a landslide susceptibility model is a crucial step, but not always sufficient (Lobo et al. 2008; Fressard et al. 2014; Steger et al. 2016a).

The findings further highlighted the need for differentiated evaluations (i.e. quantitative and expert-based) to gain insights into limitations and the reliability of landslide susceptibility analyses (Guzzetti et al. 2006b; Bell 2007; Demoulin and Chung 2007; Petschko et al. 2014b; Fressard et al. 2014; Steger et al. 2016a). In this sense, we agree that a preliminary in-depth inspection of input data should precede each landslide susceptibility analysis (Galli et al. 2008; Bell et al. 2012; Felicísimo et al. 2013; Santangelo et al. 2015). We consider field based cross-checks of inventory data and an in-depth exploratory data analysis as an essential supplement (Bell et al. 2012; Guillard and Zezere 2012; Petschko et al. 2016; Steger et al. 2016a). An estimation of modelled relationships and related uncertainties (i.e. confidences) might further provide evidence of model qualities and a potential bias propagation (Guzzetti et al. 2006b; Rossi et al. 2010; Petschko et al. 2014b; Steger et al. 2016a).

An evaluation of model transferabilities by means of repeated non-spatially assessed (CV) and spatially assessed (SCV) prediction skills may provide evidence of more or less consistent modelling results (e.g. Fig. 7d–f versus Fig. 10a–c). We observed that indications of spatially less transferable modelling results can be detected by an evaluation of the transferability index (Petschko et al. 2014b) and also by simply interpreting differences between AUROCs obtained by CV and SCV. Similar predictive performances (i.e. between CV and SCV; e.g. Fig. 7d–f) reveal that the respective models predicted all test sets equally well, independently of a spatial or non-spatial evaluation. In contrast, lower SCV results (e.g. Fig. 10a–c) provide evidence that the respective prediction models performed worse in a spatial context. This might be a standard case for real-world generated data sets (cf. Brenning 2005; Petschko et al. 2014b; Steger et al. 2016a), but especially for models generated with an inventory whose completeness differs substantially among larger geographical units (e.g. administrative units). We argue that an assessment of the spatial transferability of modelling results cannot only be useful to assess the ability of a model to predict landslides for a neighbouring area (Lombardo et al. 2014) but also to get indications of a potential spatially varying consistency of models within one area.

However, during this study, it became more and more evident that it is still the analyst who needs to assign meaning to the obtained numerical results. Thus, we finally argue that process-knowledge and an expert-based holistic interpretations of all results still remain an essential step towards meaningful statistical landslide susceptibility maps (Demoulin and Chung 2007; Cascini 2008; Zêzere et al. 2009; Fressard et al. 2014; Steger et al. 2016a, 2016b).

Practical recommendations

According to our findings, we recommend the following four-step procedure to tackle the problem of incomplete landslide inventories when assessing landslide susceptibility by means of statistical classification techniques:

Firstly, we suggest accepting inventory errors as unavoidable. Without this realization, apparently well performing but highly distorted models of little explanatory power or practical use are more likely to follow.

Secondly, researchers should strive to gain insights into potential limitations of the present inventory data in order to assess possible implications for susceptibility modelling. This step should take into account details on the landslide data collection itself (e.g. type and resolution of mapping basis, spatial representation of landslides, spatial division of mapping responsibilities, mapping purpose) and also a broader geomorphic context (e.g. known causes of landslides and human impact in an area). These considerations should be supported by a profound literature review (e.g. which limitations are known when mapping from aerial photographs?), an exploratory data analysis (e.g. are there suspiciously high or low landslide densities within certain areas?) and field checks (e.g. are landslides missing in the inventory?).

Thirdly, modellers should consider potential limitations of the landslide inventory when adapting the modelling design. The aim should be to limit the propagation of inventory incompleteness into the final results. Here, we highly recommend avoiding predictors that directly relate to a suspected inventory error (e.g. land cover is likely to be able to directly describe a forest-related incompleteness of an inventory). Instead, the application of mixed-effects models may rather prove useful to reduce the impact of incompleteness. In this context, we suggest using strongly generalizing classifiers (e.g. GLMMs, GAMMs), because those models are likely to be less prone to overfit to errors originating from a landslide inventory (Steger et al. 2016b). The application of less flexible statistical models might additionally have the advantage of higher model transparency (e.g. via the estimation of ORs and confidence intervals) (Brenning 2012b; Goetz et al. 2015). This may further provide evidence of potential limitations and error propagation.

Fourthly, the results should continuously be evaluated by means of multiple quality criteria. Based on our results, we stress that an interpretation of one performance measure alone may lead to misleading conclusions. Modelled relationships may be evaluated by means of odds ratio estimation, while associated confidence intervals may provide insights into related uncertainties and an associated sensitivity to minor changes (and biases) in the data sets. We recommend assessing the predictive performance of a model by means of repeated partitioning techniques. We consider that an interpretation of CV with SCV results (i.e. median AUROC, interquartile ranges, transferability index) might be useful to expose inconsistent modelling results (Brenning 2012b). Finally, we want to further underline the importance for a steady geomorphological control over a purely data-driven treatment (Bell 2007; Demoulin and Chung 2007; Steger et al. 2016a), because domain experts need to interpret model results. Since landslide inventories might simultaneously be affected by positional inaccuracies and systematic incompleteness, we point to our complementary study that specifically focused on the propagation of inventory-based positional errors into statistical landslide susceptibility models (Steger et al. 2016b).

Conclusion

The study highlighted that the relation between biased landslide inventories and the results of statistical landslide susceptibility models are complex and dependent on multiple aspects, such as the selection of predictors or the modelling approach. It was shown that high validation results, but distorted relationships and geomorphically implausible landslide susceptibility maps, can be obtained for models generated with systematically incomplete inventories. Most strikingly, we observed that an inclusion of a bias-describing predictor (e.g. land cover for a forest-related bias) favoured misleading predictive performance estimates and a direct bias propagation into subsequent models. However, an exclusion of such predictors led to confounded relationships between the landslide inventories and those predictors (e.g. slope) spatially related with these bias-describing variables. In this context, the application of mixed-effects logistic regression models reduced the influence of such confounders and enabled predictions that were less influenced by inventory bias.

We finally conclude that researchers should not only focus on predictive performances when modelling landslide susceptibility but also consider possible biases inherent in a landslide inventory. An in-depth evaluation of input data and modelling results, as well as an adaptation of model design, might prove valuable to reduce the impact of inventory errors on statistical landslide susceptibility models.