1 Introduction

The definition of landslide hazard originally proposed by Varnes and the IAEG Commission on Landslides and Other Mass-Movements (1984) and later updated by Guzzetti et al. (1999) features a spatial component aimed at defining “where” slope failures can be expected, a temporal component, aimed at defining “when” or how frequently the slope instability process is expected and an additional component aimed at defining “how large or destructive” a single landslide or a landslide population may be.

The susceptibility is by far the most studied element among the three (Reichenbach et al. 2018), whereas the “size or destructiveness” is the least common although it has been highlighted to be crucial even in international guidelines (Fell et al. 2008).

The temporal component is usually investigated separately from the susceptibility, this being the case of rainfall-threshold studies (e.g., Aleotti 2004; Guzzetti et al. 2004; Wu et al. 2015) or Early Warning Systems (e.g., Graziella et al. 2015; Greco et al. 2013; Guzzetti et al. 2019). Because of this, few examples exist where the spatial and temporal components of the landslide hazard are combined together (e.g., Ghosh et al. 2012; Fan et al. 2021). The typical statistically-based example corresponds to Cama et al. (2015) where the authors tried to calibrate a susceptibility model over past landslides and validate it over a subsequent landslide inventory. However, this approach assumes the susceptibility to be stable across time and neglects any dependence between past and future slope failures. A step forward in this direction has been proposed by Samia et al. (2017) where the authors accounted for landslide interactions across time by simplistically building multi-temporal susceptibility models where the next model features as covariate the presence-absence signal of the previous event. However, this approach retrieves a single parameter for the whole study area to describe the temporal dependence, which is also determined in a discrete series of temporal values. The same concept has been recently brought even further by Lombardo et al. (2020) where the authors assessed the space-time landslide intensity (number of landslide per mapping unit) by featuring a temporal dependence at the Slope Unit level via a Autoregressive model. However, even in these cases, the residual space-time dependence due to the trigger is not directly incorporated in the analyses.

Only few examples directly include the trigger effect in the susceptibility/hazard models. For instance, Nowicki et al. (2014); Nowicki Jessee et al. (2018) demonstrate how the US Geological Survey provides a statistically-based near real-time prediction by using a model that has initially estimated the effect of ground motion over the available set of global earthquake-induced landslides (EQIL) (Schmitt et al. 2017; Tanyaş et al. 2017). Then by assuming that the effect is constant for other earthquakes, they plug-in the ground motion of near real-time earthquakes to produce case-specific prediction maps.

Apart from statistical approaches, there is a large literature aiming at assessing landslide hazard via probabilistic assessments (e.g., Jibson et al. 2000; Romeo 2000; Del Gaudio et al. 2003; Del Gaudio and Wasowski 2004; Rathje and Saygili 2008; Saygili and Rathje 2009). Jibson et al. (2000) developed one of a pioneer probabilistic seismic landslide hazard assessment method to conduct a regional-scale seismic slope stability analysis. In fact, the 1994 Northridge earthquake made this progress possible as this is the first earthquake for which extensive data on engineering properties of geologic units, ground shaking, and triggered landslides were available (Jibson et al. 1998, 2000). Jibson et al. (2000) used these data sets and combined shear-strength data for each geologic unit in the area with slope derived from a 10-m digital elevation model (DEM) to predict the threshold ground-shaking acceleration required for the initiation of the sliding (referred to as the critical or yield acceleration). They then predicted the resulting displacement using an empirical displacement model based on Newmark’s sliding-block method (Newmark 1965) that used shaking levels recorded during the earthquake. Finally, they compared the predicted displacements to the EQIL inventory and showed that increasing predicted Newmark displacement does, in fact, correlate with increasing landslide frequency. The study provides a simple mathematical relation between predicted Newmark displacement and the probability of landsliding. This study thus provides a basic quantitative framework for using a physical modeling approach to estimate seismic landslide hazards at regional scale. In fact, this approach was also improved by adopting some of the inputs of these physical-based models from scenario-based earthquakes (e.g., Rathje and Saygili 2008; Saygili and Rathje 2009)). As with most physically based methods, this simplified mechanistic approach has the advantage of more accurately reflecting the underlying processes, despite the uncertainties caused by those simplifications (Allstadt et al. 2017). But the geotechnical and seismic data required to apply this model are not available everywhere and can be difficult to estimate.

A similar framework is also adopted in the context of rainfall-induced landslide. Kirschbaum and Stanley (2018) demonstrate how the NASA provides near real-time prediction by using a model that has initially estimated the precipitation effect over the available set of global rainfall-induced landslides (Kirschbaum et al. 2010, 2015). Then the near real-time landslide prediction is obtained by plugging in meteorological estimates of the incoming storms.

These methods are extremely useful if not fundamental to support the initial stages of any disaster, helping authorities to prioritize certain regions or actions as a function of the expected hazard. However, they cannot support long term planning because of their near real-time nature.

A comprehensive landslide susceptibility or hazard assessment, helpful to plan strategies for the expected disasters in the future, should integrate both morphological information as well as the expected scenarios of the given trigger. Some examples that fit in this description can be found in Ko and Lo (2018); Melchiorre and Frattini (2012) where the authors integrate to a baseline prediction model possible rainfall scenarios or, in Lari et al. (2014) where the authors simulate rockfalls of different sizes. However, despite some specific cases mostly dealing with rainfall induced landslides (Kirschbaum et al. 2009), scenario-based landslide hazard realizations are not as common as in other geoscience branches. For instance, in engineering or seismology it is common practice to produce scenario-based seismic hazard maps where the scenario at hand corresponds to the exceedance probability of an extreme ground motion occurring within a certain return time (e.g., Abrahamson and Bommer 2005; Hanks et al. 2005; Lee et al. 2000; Montilla et al. 2003).

In this work, we took inspiration from the scenario-based studies mentioned above in relation to rainfall-induced landslides and from the scenario-based context in seismology. More specifically, we include scenario-based realizations of ground motion patterns into Slope-Unit-based landslide susceptibility models, via statistical simulation. We do this by building a Generalized Additive Model (GAM, Goetz et al. 2015; Lombardo et al. 2018) of a binomial family using as target variable the presence/absence distribution of EQIL associated with the 1994 Northridge earthquake (Harp and Jibson 1995, 1996). We correlate this inventory to geomorphological parameters together with the Peak Ground Acceleration (PGA) of the Northridge seismic trigger (Worden and Wald 2016). From this reference model, we then simulate 1000 statistical realizations of the estimated susceptibility for each of the 217 ground motion scenarios in the Northridge area available at (USGS 2017). As a result, we are able to draw information already developed from the seismological community, and feature it into susceptibility models that can therefore reflect the estimated susceptibility at given return times of the seismic trigger. Our susceptibility then both features a native spatial characteristic together with the coexisting temporal dimension carried by the scenario-based PGAs.

Overall, we present our study by introducing the 1994 Northridge earthquake, the associated landslide inventory Sect. 2.1 and the available earthquake scenarios for the study area Sect. 2.2. Subsequently, we describe the modeling strategy and the simulation steps Sect. 3 whereas the results are shown in Sect. 4. Section 5 provides our interpretation of the framework we propose and Sect. 6 lists our concluding remarks and future perspectives. Supplementary materials are located at this link: https://doi.org/10.17026/dans-xjk-5676. A description of the supplements can be found in Sect. 7.

2 Study area

2.1 Northridge earthquake and co-seismic landslides

To carry out our analyses, we focus on the 1994 Northridge earthquake, which is considered a milestone (Fan et al. 2019) because this earthquake led the scientific community to make digital seismic networks widespread across the globe (Wald et al. 2003). Also, the U.S. Geological Survey (USGS) ShakeMap system was developed (Wald et al. 1999) following the Northridge earthquake. This system provides worldwide estimates of ground-motion parameters and thus provides a fundamental information for many EQIL modeling studies, which is still commonly used across the whole geoscientific community (Allstadt et al. 2017).

Fig. 1
figure 1

a Normalized co-seismic landslide density generated by the Northridge earthquake; b Ground motion due to the Northridge earthquake expressed in terms of PGA

The landslides triggered by the Northridge earthquake (see Fig. 1a) were mapped by Harp and Jibson (1995, 1996). They indicated that they mapped at least 90 percent of the landslide affected area using high-altitude analog aerial photography. They also reported that they most likely missed not more than about 20 percent of landslides greater than 5m and not more than 50 percent of those smaller than 5m in the maximum planar dimension. As a result, they mapped 11,111 landslides over an area of about \(10,000 km^2\). The majority of the landslides were reported as shallow (\(1-5 m\)), highly disrupted falls and slides in weakly cemented clastic sediment (Harp and Jibson 1995). This inventory is still considered nowadays one of the most detailed EQIL datasets in landslide science because of its high quality and completeness (Harp et al. 2011; Malamud et al. 2004; Tanyaş et al. 2018; Tanyaş and Lombardo 2020).

2.2 Scenario-based ground motion data

The Shakemap System of the USGS not only provides actual estimates of ground motion parameters associated to earthquakes occurring all over the globe, but it also provides ShakeMaps for US based on deterministic ruptures from U.S. National Seismic Hazard Map event catalog produced by Petersen et al. (2008). More specifically, the available scenarios correspond to time-independent PGA maps for 2% and 10% probability of exceedance in 50 years for peak horizontal ground acceleration (Petersen et al. 2008). These scenarios originate from a subset of the possible rupture planes which are characteristic of all known active faults in a given region. Furthermore, a single scenario represents one single realization of a potential earthquake nucleation, assuming an underlying magnitude, hypocenter and rupture plane, although directivity is not accounted for. As a results, the USGS Shakemap System provides a comprehensive seismic hazard assessment for specific seismically active landscapes within the United States (USGS 2017).

Fig. 2
figure 2

a Distribution of scenario-based earthquakes (both epicenters and rupture planes), the cyan polygon shows the study area for which we extracted the corresponding 217 scenarios; b Probability Density Distribution of the selected and rescaled scenario-based PGAs

For the area affected by the Northridge earthquake, not only the specific ground motion is available (see Fig. 1b) but also a wide number of possible scenarios. Among the available scenarios, here we extract 217 cases which fall within the landslide affected area due to the Northridge earthquake. This information is geographically shown in Fig. 2a and the corresponding 218 PGA distributions are summarized in Fig. 2b where the PGA values have been all rescaled to emphasise similarities and differences among the possible ground motion realizations.

3 Modeling strategy

3.1 Mapping unit choice and presence/absence assignment criterion

Any landslide susceptibility model cannot be defined without a reference mapping unit. A mapping unit correspond to the geographical object upon which any study area is partitioned. Apart from the common grid-cell choice (Reichenbach et al. 2018), Slope Units (SUs) subdivide the terrain between streamlines and ridges under the constrain of Slope and Aspect within-unit homogeneity (Alvioli et al. 2016). On the one hand, the strength of using SUs resides in the representation of the morphodynamic response of a landslide, which is entirely neglected when using grid-cells. On the other hand, SUs require an additional aggregation step of the finely described predisposing factors, compared to the grid-cell counterpart.

In this work, we adopt SUs as the mapping unit of our choice, the reason being their ability to reliably mimic the geomorphological response to slope instability.

We compute SUs via the software r.slopeunits made available by Alvioli et al. (2016) by adopting:

  • Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM, of approximately 30-m resolution Jarvis et al. 2008).

  • a large flow accumulation threshold (\(500,000m^2\)) to ensure that all the possible scale-dependent SU can be examined during the calculations.

  • a relatively small minimum SU area (\(50,000m^2\)) to reliably partition the space at a small to medium scale

  • a circular variance of 0.5.

From the resulting polygonal partition of the study area, we then assigned the landslide presence condition for SU which intersect the mapped landslide polygon of the co-seismic Northridge inventory. The absence case, is of course the complementary situation.

3.2 Covariates

Our covariate set features both morphometric and thematic properties in addition to the Northridge ground motion expressed in term of PGA. The initial set has been chosen in accordance to the literature, extracting the most common covariate in landslide susceptibility studies (Budimir et al. 2015; Lombardo and Tanyas 2020; Reichenbach et al. 2018).

The morphometric properties correspond to DEM derivatives from the SRTM DEM. Specifically, we computed: (1) Slope (Zevenbergen and Thorne 1987), (2) Planar (3) and Profile Curvatures (Heerdegen and Beran 1982), (4) Vector Ruggedness Measure (VRM, Sappington et al. 2007), (5) Local Relief (LR, Jasiewicz and Stepinski 2013).

In addition to those, we also compute the Euclidean distance from each centroid, of a squared lattice coinciding with the DEM resolution, to the nearest river channel. And, complete our covariate set with the Soil Depth map locally estimated, digitized and made available by (NRCS 2010) together with the actual SU extent measured in \(m^2\).

Notably, mapping unit of our choice does not coincide with the covariates mentioned above. For this reason, whether the resolution was higher than the SU dimension, we computed the mean and standard deviation values of each covariate distribution within the SUs. And, for Soil Depth and Northridge PGA, we opted to solely estimate the mean value because the soil map is originated from studies at scales ranging from 1:12,000 to 1:63,360 and the resolution provided by the Shakemap System is approximately 1km.

Prior to any modeling routine, we rescaled each covariate by mean zero unit variance (substracting the mean and dividing by the standard deviation) to ensure that the covariates would share the same unitless scale, hence to ensure their comparison once we fit out model.

3.3 LASSO-penalized Generalized Linear Model

The most common approach to build landslide susceptibility models is a Logistic Regression (Reichenbach et al. 2018) formulated in a Generalized Linear Model (GLM) which assumed that landslide presences/absences are distributed over space according to the Bernoulli probability distribution (Akgün and Türk 2011; Brenning 2005; Das et al. 2010).

The definition of a GLM for a binomial case can be summarised as follows:

$$\begin{aligned} \eta (P)=\beta _0+\beta _1 x_1+\cdots +\beta _mx_m, \end{aligned}$$
(1)

where \(\pi \) is the probability that Y takes the value 1 (or a landslide is present at a given mapping unit), \(\eta \) is the logit link and \(\beta \)s are the estimated regression coefficients for a covariate which is assumed to behave linearly with respect to slope instability conditions. And, by making the logit function explicit, the probability of landslide occurrence can be obtained as follows:

$$\begin{aligned} P = \frac{e^{\beta _0+\beta _1 x_1+\cdots +\beta _mx_m}}{1 + e^{\beta _0+\beta _1 x_1+\cdots +\beta _mx_m}}, \end{aligned}$$
(2)

where the numerator is the linear combination of each model component, the latter being a single or multiple regression constants or the product between the regression coefficient and the corresponding vector of a given covariate values.

The choice of covariates can span over several morphometric and thematic properties in the literature (Budimir et al. 2015), although some of them may not contribute to explain the landslide distribution. In such cases, maintaining a covariate set with non-informative covariates may not only increase the dimension of the model space but also bring collinearity issues and reduce the overall predictive capacity (Castro Camilo et al. 2017). As a result, variable selection procedures are often implemented to reduce high predictor dimensionality issues. The most common variable selection found in the geoscientific literature corresponds to the Stepwise case (e.g., Mathew et al. 2009; Cama et al. 2017), although, irrespective of being foward or backward, it has been proven to be flawed in the statistical literature (see, Harrell Jr 2015). However, much more robust alternative variable selection procedures exist and among them, the Least Absolute Shrinkage (LASSO) and Selection Operator has been proven to provide not only better performance in landslide susceptibility models but also to be able to more strictly reduce the number of covariates while simultaneously addressing collinearity issues (see Amato et al. 2019, and explanation therein).

The LASSO selection is controlled by a penalization or shrinkage parameter, usually reported as \(\lambda \). More specifically, as \(\lambda \) increases the model is forced to remove redundant information from the covariate set (see Lombardo and Mai 2018, and explanation therein).

In this work, we adopt a LASSO-penalized GLM purely to screen out non-informative covariates and subsequently test a more complex version of a GLM. We do this by using the glmnet package (Friedman et al. 2009) in R where by default, for 100 \(\lambda \) values the functions implements a purely random 10-fold cross-validation scheme (see Hastie and Qian 2014, and explanation therein) to explore the hold-out performance variability as the penalization increases.

3.4 Generalized additive model

A Generalized Additive Model or GAM (Brenning 2008; Lombardo et al. 2018) is an extension to the linear framework available in the simpler GLM case. A GAM, in addition to being able to feature linear relations (fixed effects) between covariates and landslide instances, it provides a framework to account for all sorts of nonlinear relations (random effects). The latter can correspond to categorical cases where discrete covariate classes are modeled independently from each other or, to ordinal cases where discrete covariate classes are modeled with inter-class dependence or, to latent effects over space (Lombardo et al. 2019) and time (Lombardo et al. 2020).

A generic formulation for a binomial case can be summarized as follows:

$$\begin{aligned} \eta (P)=\beta _0+\beta _1 x_1+\cdots +\beta _mx_m + f(X_n), \end{aligned}$$
(3)

where f here stands for a nonlinear function of a covariate X with n discrete classes. In this work, we chose f to be a random walk of the first order, or a Spline (Bakka et al. 2018; Conoscenti et al. 2016), which is meant to ensure that the ordinal structure between adjacent classes of a reclassified continuous covariates is not lost. In this work we adopt a Bayesian version of a GAM by using the R (R Core Team 2013) package R-INLA (see Rue et al. 2009, 2017, and explanation therein). We chose a Bayesian version because any model (and each of its components) falling in this category is natively characterized by a distribution of values rather than the frequentist counterpart. This will be particularly relevant in the formulation of the simulation steps described in Sect. 3.5.

3.4.1 10-fold cross-validation

The landslide literature reports several cross validation schemes. Among these, very poor examples can be found in articles where a given model is merely validated once (e.g., Arabameri et al. 2020) and much more scientifically sound cases where k-fold cross-validation (CV) schemes are implemented (e.g., Steger et al. 2020) to express the variability of model performance as the training and test subsets are iteratively sampled at random. The idea behind a k-fold CV is to statistically represent a large portion of the datatest at hand (both in terms of presence-absence cases and associated covariates) hence providing a description of mean predictive skill of a given model together with the associated uncertainty.

In this work, we opt for a special case of a k-fold CV. Specifically, we choose to randomly select from our dataset ten random partitions each corresponding to 10% of the total, but imposing that a given SU can be sampled only once. As a result, we do not only explore a large portion of the dataset but its entirety, by calibrating over 90% and validating onto the complementary 10%. Notably, by rearranging the ten 10% subsets, a fully predicted susceptibility map can be obtained, therefore reducing the effect that same SUs, and associated characteristic may yield in the calibration stage.

3.5 Statistical simulations

Statistical simulations are feasible in any statistical context, being frequentist or Bayesian. However, the latter case is particularly suitable to support such operations. Once a model is fitted, each component, from the intercept to fixed and random effects, is characterized by a whole parameter distribution. Therefore, it is possible to randomly generate any number of possible realization of each parameter. And, by solving for Eq. 3 it is possible to retrieve the full spectrum of possible susceptibility estimates (Ferkingstad et al. 2015).

In this work, we use a specific type of simulations. We initially fit a model where the presence/absence distribution corresponds to the co-seismic landslides triggered by the Northridge earthquake, and where the covariate set features come covariates together with the Northridge PGA. On the basis of the distribution of each model component, we then simulate according to the scheme mentioned above. However, we use a plug-in structure (Stanton and Diggle 2013), where the Northridge PGA values are removed and substituted by each of the 217 possible PGA scenarios for the study area. For each of the 217 PGA scenarios, we produce 1000 simulations and store the mean resulting suscptibility together with the 95% credible interval expressed as the difference between the 97.5 and the 2.5 percentiles of the 1000 simulations.

Ultimately, we call each mean simulated susceptibility together with the posterior susceptibility estimated for the Northridge earthquake. And we use this combined information to assess which SU exhibits a landslide-prone probability across the entirety of susceptibility realizations, together with its uncertainty.

4 Results

4.1 LASSO covariate selection

Figure 3 summarizes the LASSO variable selection, where the AUC is shown to sligthly decrease as \(\lambda \) increases up to a minimum number of 6 covariates (vertical dashed line to the right). Any further combination of covariates with number less than 6 is shown to rapidly weaken the model to an extent where the performance significantly drop, hence depriving the model of fundamental information to explain the presence/absence landslide distribution.

Fig. 3
figure 3

LASSO variable selection. The x-axis to the bottom reports the shrinkage parameter \(\lambda \) in logarithmic scale (for conciseness) whereas the second axis to the top clarifies the consequent number of covariates. The y-axis shows the associated performance variability (expressed in AUC) tested in a purely random 10-fold CV. Red circles correspond to the mean AUC at specific \(\lambda \) whereas the uncertainty around it is expressed as two times the standard deviation across each CV. The vertical dashed line to the left corresponds to the most conservative model. The vertical dashed line to the right corresponds to a model where the AUC has not significantly decreased compared to smaller values of \(\lambda \) but larger \(\lambda \) values would yield a significant drop in AUC

The six selected covariates correspond to the SU Area, VRM\(_\mu \), VRM\(_\sigma \), PGA, Slope\(_\sigma \) and Soil Depth.

4.2 Covariate effects

Here we report the covariate effects from the Bayesian GAM framework we followed after selecting our parameter space via LASSO. More specifically, out of the six covariates mentioned above, we opted to model the Slope\(_\sigma \) and Soil Depth as ordinal properties with a Random Walk of the first order used to impose adjacent class dependency.

Fig. 4
figure 4

Summary of fixed and random effects of our reference susceptibility model for the Northridge earthquake. Fixed effects are shown in panel a, whereas the random effect for the Slope\(_\sigma \) is shown in panel b and the same for Soil Depth is shown in panel c

Figure 4a summarizes the posterior distribution of each covariate used linearly in our model. Specifically, they are all shown to be significant (each distribution is above the zero line) but with various levels of associated uncertainty. The SU Area is clearly the parameter with the narrowest posterior distribution, followed by the PGA. The two components of the VRM signal have comparable uncertainty levels. Taking this aside, the PGA shows the greatest influence with respect to the final susceptibility, for the regression coefficients are all expressed in the same unitless scale due to the mean zero unit variance step we mentioned in Sect. 3.2. As for the two random effects, both clearly behave in a nonlinear fashion, supporting our choice. For the Slope\(_\sigma \), small variations of slope steepness within a SU negatively contribute to the susceptibility. From slope steepness variations (measured in \(1\sigma \)) greater than 5 degrees, the effect becomes positive, indicating landslide-prone conditions. As for the Soil Depth, a peculiar pattern is shown. Two separate portions of the soil depth distribution contribute to decrease the estimated susceptibility whereas from approximately 0.4m to 1.0m the effect is opposite. This result well aligns to the description of the landslides provided by Harp and Jibson (1995, 1996). The authors report that the vast majority of co-seismic landslides were shallow-seated.

4.3 Cross-validation performance

Figure 5 summarizes the overall predictive performance for the 10-fold CV scheme we implement for the co-Northridge susceptibility. As the validation subset changes, a limited variability in the resulted performance metrics can be seen. This is graphically depicted with a limited spread of the ROC curves and consequently in their AUC estimates. More specifically, The median AUC is approximately 0.82, which corresponds to an excellent performance according to the AUC classification proposed by Hosmer and Lemeshow (2000).

Fig. 5
figure 5

Summary the 10-fold CV performance. Panel (a) shows the Receiver Operating Characteristic curves, one for each of the ten CV. Panel (b) summarises the distribution of associated AUC values

4.4 Benchmark Northridge susceptibility

A number of studies have already focused on the co-seismic landslide susceptibility induced by the 1994 Northridge earthquake (e.g., Godt et al. 2008; Nowicki et al. 2014; Nowicki Jessee et al. 2018; Tanyas et al. 2019). For this reason, and in light to the different aim of this work, we choose to report the posterior mean susceptibility in its numerical form, together with its uncertainty measured in a 95% CI. For conciseness, we do not report the same information in map form.

Fig. 6
figure 6

Error plot showing the relation between the posterior mean of the co-Northridge susceptibility and its associated uncertainty

Figure 6 shows the error plot. This graph is crucial in most landslide susceptibility studies (Rossi et al. 2010). In fact, an ideal model should produce very low and very high susceptibility estimates associated with low uncertainty. If such condition is met, because the potential users typically operate in this range of probabilities, they can trust the results and plan accordingly. Conversely, the only portion of the error plot which is accepted to report the highest uncertainty corresponds to the central portion of the susceptibility distribution, where the determination of likely landslide-prone SUs is intrinsically more difficult (Lombardo et al. 2016; Reichenbach et al. 2018) In this sense, the posterior estimates show a clear bell shape, suggesting that the model behaves as intended.

4.5 Scenario-based susceptibility

In this section we report the result of our simulation scheme. We remind here that we have run 1000 simulations for each of the 217 ground-motions scenarios extracted for the study area. This information is shared in the supplementary material, at the following link: https://doi.org/10.17026/dans-xjk-5676. At this address we deposited both the SU vector file (.mif) and scenario-based susceptibility maps reported as the mean and 95%CI for each of the 1000 simulation batches. Also, we listed all the links to download the PGA scenarios from the USGS ShakeMap system. In Figure 7 we show the Probability Density Functions of each mean scenario-based susceptibility computed from the (\(1000\times 217\)) simulations.

Fig. 7
figure 7

PDFs of the mean susceptibility estimated computed from the 1000 simulation batch for each of the 217 ground motion scenarios

A similar information is summarized in map form in Figure 8, where out of all the simulations we extract the global mean and associated uncertainty. Additionally we report the worst-case scenario where we computed for each SU in the study area the maximum probability out of all the mean scenario-based susceptibilities.

Fig. 8
figure 8

Descriptive statistics of the 217 ground motion scenarios (and 1000 simulations for each case) together with the 1994 Northridge susceptibility. Panel (a) shows the mean susceptibility of all the possible ground motions scenarios combined. Panel (b) reports the uncertainty in the susceptibility estimates for all scenarios. Panel (c) depicts the susceptibility corresponding to the maximum probability values for each SU and for all scenarios together

To provide a more intuitive understanding of our simulation scheme we propose, we select five among the largest scenario-based ground motions available at the USGS Shakemap service, falling within the study area. Their epicenters, rupture planes and metadata are shown in Fig. 9a. In the same figure we plot the mean susceptibility map computed from the 1000 simulations, for each of the five scenarios (Fig. 9b-f).

Fig. 9
figure 9

Panel (a) shows a geographical summary of a selected number of scenario bases epicenters and associated rupture planes. The white polygon corresponds to our study area. For each case shown in the previous panel, we report the corresponding mean susceptibility map out of the 1000 simulation runs (panels b to f)

5 Discussions

In this work, we take advantage of a plug-in simulation procedure. We initially calibrate a GAM-susceptibility model for the 1994 Northridge earthquake. And, on the basis of the estimated fixed and random effects (and associated distributions) we substitute the PGA pattern from the Northridge with each of the available 217 PGA scenarios within the study area. For each PGA scenario, we then run 1000 simulations to capture the variability of the susceptibility pattern as the ground motion changes from one predicted scenario to another. The distribution of the mean susceptibility for each simulation batch appears quite different from case to case. This is shown in Fig. 7 with large oscillations. This information is further clarified in Fig. 8 where the mean susceptibility of all the \(1000 \times 217\) simulations is plotted in map form together with the 95% CI and maximum level. The global mean (Fig. 8a) shows few extremely susceptible slopes scattered over the landscape. Notably, by computing the mean for all simulations, the susceptibility may be smoothed and end up showing a low variability or a limited number of expected unstable SUs. Conversely, by looking at the maximum susceptibility across all the simulations (worst-case global scenario in Fig. 8c), three regions within the study area appear more prone to fail than the rest. The steep SUs located along the coastline are part of the worst case scenario as well as a sequence of SU approximately aligned from East to West. These are SUs exposed to the main bulk of scenario-based ground motion patterns. Ultimately, the third cluster of highly susceptible SUs is located to the Northern sector of the study area where the topography is significantly rough and scenario-based ground motion quite common among the selected cases.

To provide further examples of our simulation procedure, Fig. 9 moves away from the global summary provided above, and depicts five cases for which the PGA was reported to the be among the largest of the 217 scenarios we examined. Here the ground motion effect is particularly evident for each of the five mean simulated susceptibilities reflects different susceptible SUs. However, despite we managed to capture the seismic effect over the co-Nortridge landslides and project it over the scenario-based ShakeMaps, the model may still lacks part of the information required to fully describe the physics of earthquake-induced ground deformation. In fact, on the one hand the directivity is not included in the ground motion scenarios (Allstadt et al. 2018). Therefore, some specific portion of the landscape aligned to the main direction of the earthquake radiation pattern may still not appear as susceptible as they may be. On the other hand, even the topographic amplification is not accounted for in the earthquake scenarios (Allstadt et al. 2018). As a result, the spatial dependence that narrow mountaintops should exhibit between susceptibility and the wavefield amplification is also neglected.

However, despite minor spatial dependency issues, our model is able to efficiently include any PGA effect into any potential susceptibility realization within the study area under the assumption that the PGA effect we estimated for the Northridge model (see Sect. 4.4) is robust. We believe so because the governing mechanical laws that link the slope instability to the ground shaking should not change with time. As a result, the only limiting factor for similar procedures should depend on the statistical inference. In this sense, we believe the fixed effect for the PGA to be well-estimated (see Fig. 4a) both in its amplitude (posterior mean) and significance (posterior 95% credible interval above the zero line).

Throughout the manuscript we have referred to our simulation results in terms of susceptibility. However, we would like to point out that the scenario-based PGA estimates provided by the USGS ShakeMap system have a temporal connotation. As mentioned before, USGS (2017) reports each scenario to be associated with a return time of 50 years. Therefore, our results, either if we examine specific scenarios like in Fig. 9 or if we consider the combined effects of all scenarios together like in Fig. 8, both express probabilities of landslide occurrence (susceptibility) within an expected timespan of 50 years. This definition satisfies most the notion of landslide hazard (Fell et al. 2008; Guzzetti et al. 1999). As a result, we argue that the method we propose allows one to develop predictive maps which contain similar if not the same information to scenario-based landslide-hazard maps.

6 Concluding remarks

We conclude this section by stressing that this application could actually be replicated in any other landslide hazard assessment. The only requirement is the availability of the co-seismic landslide inventory and ground motion (to build the reference model) as well as the predicted PGA scenarios (to simulate). Furthermore, in regions with a similar morpho-tectonic setting, there is no strict requirement other than the sole availability of PGA scenarios. In fact, under the assumption that landslides may respond in a similar manner to the ground motion as they did for the Northridge earthquake, we could use the very same statistical model we generated here to simulate in other geographic contexts. The only difference would be that the plug-in procedure, should be extended not only to the PGA scenarios but to the morphometric properties of the target area.

We also stress here that the use of the scenario-based ground motion is only one possible application in simulating landslide prone conditions. In fact, one could follow the same strategy and simulate over precipitations, land use changes and or, much closer to the framework we propose here, over past ground motion scenarios. The latter could unravel potential landslide prone-conditions in the past and help explore historical data for which data is usually poor or non-existing. An example that goes in this research direction can be found in the companion paper to the present one (Luo et al. 2021).