Emulator-based global sensitivity analysis for flow-like landslide run-out models

Landslide run-out modeling involves various uncertainties originating from model input data. It is therefore desirable to assess the model's sensitivity. A global sensitivity analysis that is capable of exploring the entire input space and accounts for all interactions, often remains limited due to computational challenges resulting from a large number of necessary model runs. We address this research gap by integrating Gaussian process emulation into landslide run-out modeling and apply it to the open-source simulation tool r.avaflow. The feasibility and efficiency of our approach is illustrated based on the 2017 Bondo landslide event. The sensitivity of aggregated model outputs, such as the apparent friction angle, impact area, as well as spatially resolved maximum flow height and velocity, to the dry-Coulomb friction coefficient, turbulent friction coefficient and the release volume are studied. The results of first-order effects are consistent with previous results of common one-at-a-time sensitivity analyses. In addition to that, our approach allows to rigorously investigate interactions. Strong interactions are detected on the margins of the flow path where the expectation and variation of maximum flow height and velocity are small. The interactions generally become weak with increasing variation of maximum flow height and velocity. Besides, there are stronger interactions between the two friction coefficients than between the release volume and each friction coefficient. In the future, it is promising to extend the approach for other computationally expensive tasks like uncertainty quantification, model calibration, and smart early warning.


Introduction
Flow-like landslides, e.g. rock avalanches and debris flows, pose an ongoing threat to life, property, and environment in mountainous regions around the world. In order to assess their hazard and design mitigation strategies, many research efforts have been devoted to developing computational landslide run-out models which are capable of simulating the dynamics of the flow over complex topographies. The majority of these models employ depth-averaged shallow flow equations derived from mass and momentum balance. Examples are TITAN2D (Pitman et al., 2003), DAN3D (Hungr and McDougall, 2009), RAMMS (Christen et al., 2010), r.avaflow (Mergili et al., 2017), faSavageHutterFOAM (Rauter et al., 2018), etc., see McDougall (2017) for a review.
Such models generally require a variety of input data, including release area and volume (a release polygon given as a shape file or a raster map of release heights), flow resistance parameters (dry-Coulomb friction and turbulent friction parameters for the Voellmy rheology), and topographic data (a digital elevation model). If the input data are accurate, the models can be deterministically run to predict characteristics of the landslide's bulk behavior, such as run-out distance, impact area, spatio-temporally resolved flow height and velocity. In practice, however, the input data usually involve large uncertainties (Dalbey et al., 2008). For example, release areas and volumes of landslides are challenging to predict due to the complexity of geological pre-conditioning factors and often a lack in subsurface information. They may be approximated by heavily-tailed probability density functions based on the statistical properties of landslide inventories (Quan Luna et al., 2013). The flow resistance parameters are more conceptual than physical (Fischer et al., 2015). They are generally obtained by back-analyzing past events. The choice of flow resistance parameters for landslide run-out forecasting thus naturally involves uncertainties. Topographic data may also be subject to uncertainties due to error introduced during source data acquisition or data processing (Zhao and Kowalski, 2020). Therefore, it is essential to study the model's sensitivity to uncertain inputs, which could improve our understanding of the computational landslide run-out models and provide guidelines for their future usage.
Sensitivity analyses on landslide run-out models are commonly based upon local one-at-a-time approaches, i.e., changing one input variable at a time while keeping others at their baseline values to explore its isolated effect on model outputs (Hussin et al., 2012;Schraml et al., 2015). While straightforward to implement, these types of local sensitivity analysis methods cannot assess potential interactions between input variables. Their result may highly depend on the chosen baseline values (Girard et al., 2016). In contrast, variance-based global sensitivity analyses can fully explore the input space, quantify the contribution of each variable to the output variation, and identify interactions between different variables. The Sobol' method, one typical variance-based method, has been developed and widely used since 1990s (Sobol', 1993(Sobol', , 2001Saltelli, 2002;Saltelli et al., 2010). The calculation of Sobol' sensitivity indices usually requires Monte Carlo-based methods, leading to a large number of necessary model evaluations. For computationally demanding models, the calculation may be prohibitively expensive. In that case, it is rather promising to employ emulation techniques to overcome the computational challenge.
An emulator is a statistical representation of a computationally demanding model referred to as a simulator. While it comes at the prize of an additional statistical error, it is typically evaluated several orders of magnitude faster than the simulator. Emulation techniques have been developed since 1980s (Currin et al., 1991;O'Hagan, 2006). Many researchers have utilized them for the purpose of global sensitivity analyses in different fields (Lee et al., 2011;Rohmer and Foerster, 2011;Lee et al., 2012;Bounceur et al., 2015;Girard et al., 2016;Aleksankina et al., 2019). These studies either focus on emulating the evaluation of a few scalar outputs (Lee et al., 2011;Rohmer and Foerster, 2011;Girard et al., 2016), or build separate emulators for each of the many outputs (Lee et al., 2012;Aleksankina et al., 2019). One exception among them is Bounceur et al. (2015), who combine emulation techniques with the principal component analysis leading to emulation of a reduced-order model. For a simulator with massive outputs like a landslide run-out model, building separate emulators for each output can be computationally intensive (Gu and Berger, 2016). In recent years, great improvement has been made to enable simultaneous emulation for multi-output models, see for instance Rougier (2008) and Gu and Berger (2016).
The goal of this study is twofold: The first is a methodological goal, namely to combine the recent development of emulation techniques (Gu and Berger, 2016;Gu et al., 2018Gu et al., , 2019, landslide run-out models (Mergili et al., 2017), and global sensitivity analyses (Le Gratiet et al., 2014) to enable global sensitivity analyses of computationally demanding landslide run-out models for the first time. The second goal is application-oriented and aims at employing the methodology to assess the relative importance of different uncertain inputs, specifically flow resistance parameters and the release volume, and their interactions in landslide run-out models based on the 2017 Bondo landslide event as a test case.
This paper is set out as follows. In section 2 the methodology is described, including the computational landslide run-out model based on the Voellmy rheology, Sobol' sensitivity analysis, Gaussian process (GP) emulation, and an algorithm to take emulator uncertainty into account. Section 3 presents our Python-based implementation. Section 4 describes the case study. Section 5 is devoted to a discussion of our results. In section 6, important conclusions are drawn.

Computational landslide run-out model based on the Voellmy rheology
Depth-averaged shallow flow type process models have gained popularity in practice and in academia, owing to their good compromise between accuracy and computing time (Rauter et al., 2018). A variety of flow resistance laws can be used with the models depending on landslide types and characteristics of flow material (Naef et al., 2006;Hungr and McDougall, 2009). In case of flow-like landslides, the Voellmy rheology is one of the most widely used flow resistance laws (Hussin et al., 2012;Frank et al., 2015;Schraml et al., 2015;Bevilacqua et al., 2019). The governing system of the depth-averaged model employing the Voellmy rheology can be expressed in a surface-induced coordinate system as (Christen et al., 2010;Fischer et al., 2012) where X, Y, Z denote coordinates in the down-slope, cross-slope and normal directions; t denotes time; h represents flow height; u X and u Y represent components of the depth-averaged surface tangent flow velocity u along X and Y directions; g X , g Y , g Z are components of the gravitational acceleration; µ and ξ are the dry-Coulomb friction coefficient and turbulent friction coefficient, which describe the flow resistance law known as the Voellmy rheology. The process model is solved forward in time, hence an initial condition h(X, Y, t 0 ) and u(X, Y, t 0 ) is needed. Typically u(X, Y, t 0 ) is zero and h(X, Y, t 0 ) denotes the release volume and release area. Other essential inputs include the flow resistance parameters and a digital elevation map of the topography. As stated in the introduction, these input data usually involve uncertainties. The uncertainty of topographic data may be reduced by using high accuracy remote sensing data. The uncertainty of the release volume and release area may be more difficult to constrain due to the complexity of geological pre-conditioning factors and often a lack in subsurface information. This is often based on expert judgement. The flow resistance parameters depend on back-analyzing past events. It is still a great challenge to select them for quantitative risk assessment in practice (McDougall, 2017). In this study, we focus on the sensitivity of selected model outputs to the release volume v 0 (denoting the landslide magnitude) and the two flow resistance parameters µ and ξ of the Voellmy rheology.
The process model produces numerous outputs, essentially given by flow height h and flow velocity u at every space-time grid point. For the purpose of hazard assessment and mitigation, maximum values over the time duration are most interesting. In addition, aggregated scalar outputs like the apparent friction angle or impact area are commonly used to indicate the overall landslide impact. In this study, the following model outputs are under investigation.
• Apparent friction angle, the tangent of which equals to the ratio of the landslide fall height and run-out distance (De Blasio and Elverhøi, 2008). It generally decreases as the run-out distance increases. • Impact area, defined as the area of the region where maximum flow height values exceed a threshold value, here 0.1 m. • Maximum flow height over time at k locations {(X j , Y j )} j=1,...,k , denoted as (h max l 1 ,. . . ,h max l k ) T . • Maximum flow velocity over time at k locations {(X j , Y j )} j=1,...,k , denoted as ( u l 1 max ,. . . , u l k max ) T .

Sobol' sensitivity analysis
Assume that a simulator is denoted by f (x) with a p-dimensional input x = (x 1 , . . . , x p ) T ∈ R p and a scalar output y ∈ R. For the process model described in section 2.1, x is a three-dimensional vector consisting of the two friction coefficients and the release volume, namely x = (µ, ξ, v 0 ) T ; y could be an aggregated scalar output like apparent friction angle or impact area, or an element of a vector output like maximum flow height or velocity at a specific location. Input uncertainties of x induce output uncertainty of y. The essential idea of a Sobol' sensitivity analysis is to decompose the variance of y into contributions caused by each x i and their interactions. In practice, p first-order indices {S i } i=1,...,p and p total-effect indices {S T i } i=1,...,p are usually computed. They are defined as (Saltelli et al., 2010) where V and E represent the variance and expectation operator respectively, x −i denotes the vector consisting of all input factors except x i . A first-order index S i accounts for the contribution of the input factor x i to the variance of the output, independent from other input factors x −i ; a total-effect index S T i indicates the total contribution of x i to the output variation, i.e. the sum of its first-order contribution and all high-order effects owing to interactions (Saltelli et al., 2008). The difference S T i − S i thus indicates any interaction between x i and x −i . Employing this concept to landslide run-out models will hence allow us to investigate combined effects of the two friction coefficients and the release volume on simulation outputs. Computing the conditional variances in eqs. (2a)-(2b) involves nested integrals (Girard et al., 2016). This is analytically impractical for complex simulators like landslide run-out models. Instead, Monte Carlo-based methods are commonly used to estimate the Sobol' indices. The uncertainty introduced by Monte Carlo-based integration can be taken into account using a bootstrap strategy (Archer et al., 1997).
In this study, we employ the numerical procedure presented in Saltelli et al. (2010). The computational cost is N · (p +2) evaluations of a simulator, where N is the base sample size. More specifically, the denominator V (y) in eqs. (2a)-(2b) can be estimated using 2 · N simulation runs based on two independent sets of input samples. Each set consists of N input samples for the simulator. Moreover, each pair of numerators in eqs. (2a)-(2b) requires additional N simulation runs corresponding to a new set of N input samples, which is constructed from the two independent sets. It leads to additional p·N simulation runs. For the detailed procedure, please refer to Saltelli et al. (2010).
As pointed out in Saltelli et al. (2010), N should be sufficiently large, e.g. 500 or higher, which is critical in our case as the landslide run-out model itself is computationally intensive. If a single run of the simulator described in section 2.1 costs 32 minutes, which corresponds to the average run time of the 200 simulation runs in section 4.3, the sensitivity analysis for three input variables will cost at least 32 × 500 × (3 + 2) = 80000 minutes, roughly 56 days. Therefore, it is necessary to employ emulation techniques to improve the computational efficiency in order to carry out this type of global sensitivity analysis.

Gaussian process emulation
A simulator, such as the landslide run-out model, represents a deterministic input-output mapping. It is usually computationally impractical to directly use such simulator for analysis requiring a large number of simulation runs, e.g. a global sensitivity analysis described in the previous section, or an uncertainty quantification, or a model calibration. In that case, GP emulators have been widely employed owing to their robustness and rich theoretical background (Girard et al., 2016). GP emulation views a simulator as an unknown function from a Bayesian perspective; the prior belief of the simulator behavior, namely a Gaussian process, is updated based on a modest number of simulation runs, leading to a posterior which can be evaluated much faster than the simulator and can then be used for computationally demanding analyses. The fundamental assumption of GP emulation is that the simulator is a smooth continuous function of its inputs (O'Hagan, 2006). Here, we recap the principal ideas of GP emulators used in this study, for detailed information please refer to O'Hagan (1994)

Gaussian process emulator for a scalar output
Let f (x) denote a simulator with a p-dimensional input x = (x 1 , . . . , x p ) T ∈ R p and a scalar output y ∈ R. For example, if f (x) is the landslide run-out model, x is the triplet consisting of the release volume and the two friction coefficients, and y is the apparent friction angle or impact area. f (x) is regarded as an unknown function and will be modeled as a Gaussian process.
( 3) The mean function for any input x is given by the regression where . , x p ) T for a simple linear regression, and θ = (θ 1 , θ 2 , . . . , θ q ) T is the corresponding q-dimensional vector consisting of q unknown regression parameters. There are a variety of choices for the correlation functions like power exponentials, sphericals, Matérn, etc. The Matérn correlation function is chosen here following Gu et al. (2018). For any where d l = |x il − x jl | represents the distance between the two inputs in the l-th dimension, and γ = (γ 1 , . . . , γ p ) T is a p-dimensional vector consisting of p unknown range parameters. Eqs.
(3)-(5) represent the prior belief of the simulator's behavior. The fundamental idea now is to update the prior belief following a Bayesian methodology based on evaluations of the simulator at N sim selected inputs Owing to the property of the Gaussian process, the outputs corresponding to x D , denoted as where Again, owing to the property of the Gaussian process, the output y * at any new input x * follows a Gaussian distribution conditioned on y D , given by The parameters θ, σ 2 , and γ in eq. (7a) are the unknowns that need to be updated. Of these, regression parameters θ and the variance σ 2 can be integrated out using a conjugate analysis and Bayes' theorem. More specifically, a weak prior for (θ, σ 2 ) is assumed to have the form p(θ, σ 2 ) ∝ (σ 2 ) −1 , which is within the conjugate family as the likelihood, i.e. eq. (6). Combining the weak prior and the likelihood gives the posterior p(θ, σ 2 |y D , γ). Then, θ and σ 2 are successively integrated out from eq. (7a) by applying the Bayesian chain rule to p(θ, σ 2 |y D , γ) and eq. (7a). This yields a Student's t-distribution with N sim − q degrees of freedom, which describes the distribution of y * conditioned on y D and γ, i.e.
From a Bayesian viewpoint, the remaining unknown γ in eq. (8a) should also be integrated out by employing a certain prior for γ. The integral, however, is highly intractable and would require computationally intensive methods like Markov Chain Monte Carlo sampling strategies. Instead, γ is often estimated by solving an optimization problem, e.g. maximizing its marginal likelihood or finding its marginal posterior mode. In this study, we use the marginal posterior mode estimation, recommended by Gu et al. (2018) due to its robustness. Substituting the marginal posterior mode estimation of γ into eqs. (8a)-(8d), finally, gives the GP emulator, denoted asf (x). It provides a prediction of the simulator output at any new input x * in the form of eq. (8b), as well as an assessment of the prediction uncertainty, like a 95% credible interval (CI(95%)) of the prediction.

Gaussian process emulator for a vector output
Let f (x) denote a simulator with a p-dimensional input x = (x 1 , . . . , x p ) T ∈ R p and a k-dimensional output y = (y 1 , . . . , y k ) T ∈ R k . For example, f (x) is the landslide run-out model, x is the triplet consisting of the release volume and the two flow resistance parameters, and y is maximum flow height or velocity over time at k locations. In a straightforward Many Single emulator approach (Gu and Berger, 2016), each component of the simulator, i.e. {y j = f j (x)} j=1,...,k , is assumed to follow an independent Gaussian process having the form of eq. (3), with independent parameters {θ j } j=1,...,k , {σ 2 j } j=1,...,k , and {γ j } j=1,...,k . For each independent emulator, the range parameters γ j = (γ j1 , . . . , γ jp ) T need to be estimated by solving an optimization problem as described in section 2.3.1. As a consequence, the training of the emulators may take a lot of time when k is large.
In this study, we use however an alternative approach, namely the parallel partial GP emulator developed by Gu and Berger (2016) to simultaneously emulate the relation between the p-dimensional input and k-dimensional output. Similar to the Many Single emulator approach, each element of the simulator is assumed to follow an independent Gaussian process of the form eq. (3). The main difference is that all of the k Gaussian processes are assumed to share common range parameters γ, which are then estimated from the overall likelihood (Gu and Berger, 2016). The q-dimensional basis functions h(x) = (h 1 (x), h 2 (x), . . . , h q (x)) T are also assumed to be the same. These modifications greatly reduce the emulator training time. Once the estimation of the common γ is obtained, the parallel partial GP emulator is determined, which is now a collection of k Student's t-distributions. Here, it is denoted as {f j (x)} j=1,...,k . The exact form of the emulator can be found in Gu and Berger (2016).

Emulator uncertainty in Sobol' sensitivity analysis
The efficiency improvement by using GP emulators comes at a cost, i.e. additional emulator uncertainty. We can quantify this type of uncertainty as it can be evaluated from the emulator directly. Yet, we need to find a way to account for this uncertainty in the subsequent analysis. Alongside the development of emulation techniques and global sensitivity analysis methods, a number of approaches have been developed in recent years to address this issue in global sensitivity analyses, e.g. Oakley and O'Hagan (2004); Marrel et al. (2009);Janon et al. (2014); Le Gratiet et al. (2014).
For this study, we choose to integrate the method proposed by Le Gratiet et al. (2014), which combines the work of Oakley and O'Hagan (2004) and Janon et al. (2014). It can simultaneously take the Monte Carlo-based sampling uncertainty (section 2.2) and emulator uncertainty into account when calculating the Sobol' indices. We adapt the method to combine the sampling ..,N sim , usually using a Latin hypercube design.  , with i = 1, . . . , p. Quantify the overall uncertainty (i.e. Monte Carlo-based sampling uncertainty and emulator uncertainty) of an estimated Sobol' index using its standard deviation or CI(95%). scheme presented in Saltelli et al. (2010) and the GP emulators developed by Gu and Berger (2016); Gu et al. (2018).
The adapted method for a simulator with a scalar output, namely f (x), is shown in Algorithm 1. For a simulator with a k-dimensional output, i.e. f (x), the method is essentially similar. Minor modifications are as follows.

Implementation
The methodology presented in section 2 involves recent progress in different fields (i.e. landslide run-out modeling, global sensitivity analysis, and GP emulation), in which respective software solutions have been developed. In this section, we present our Python-based implementation which integrates recent open-source software in those fields to a unified framework. It serves as a wrapper to realize Algorithm 1 for computationally demanding landslide run-out models. The principle components of the implementation are as follows.
• Simulator. Mergili et al. (2017) presented the open source software r.avaflow for simulation of a variety of mass flows, which relies on GRASS GIS 7. It employs a Voellmy-type model (section 2.1) and a multi-phase mass flow model (Pudasaini and Mergili, 2019). Here, the former is the simulator under investigation. We implemented a Python-based wrapper to automatically prepare a batch job, run simulations, and extract outputs given the selected values of input variables x D , without explicitly starting GRASS and r.avaflow. • Emulator. Gu et al. (2019) presented the R package RobustGaSP (Robust Gaussian Stochastic Process Emulation), in which they implemented the marginal posterior mode estimator for the range parameters γ (see section 2.3.1) and the parallel partial GP emulator (see section 2.3.2). We implemented a Python-based wrapper based on rpy2 (the Python interface to the R language) to utilize RobustGaSP within the unified Python-based framework. • Emulator-based Sobol' analysis. Herman and Usher (2017) presented the Python package SALib (Sensitivity Analysis Library in Python), in which the numerical procedure of calculating the Sobol' indices for a simulator is implemented. We extended their codes to realize Algorithm 1 which enables emulator-based Sobol' analysis for multi-output simulators.

Case study
4.1. Case background Pizzo Cengalo, see figure 1, located in the Swiss Alps, is subjected to rock fall and landslide events since decades due to its geological pre-conditioning factors (Walter et al., 2020). Two recent landslide events in that area are well-documented and widely studied. The first event occurred on December 27th 2011. Around 1.5 Mio m 3 of rock detached from the northeastern face of Pizzo Cengalo and evolved into a rock avalanche traveling 2.7 km down the Bondasca valley. The second event occurred on August 23th 2017. Approximately 3 Mio m 3 of rock were released from the northeastern face of Pizzo Cengalo, leading to a rock avalanche traveling 3.2 km down the Bondasca valley. A part of the rock avalanche turned into an initial debris flow, followed by a series of additional debris flows within 48 hours, which reached the village Bondo (Walter et al., 2020).
Our case study is based on the topography and release area of the 2017 landslide event. A pre-event digital elevation model (DEM) and a post-event DEM are available, both with 1 m resolution. They are based on airborne laser scans after the 2011 and after the 2017 events, as well as aerial images acquired by the Swiss topographic services Swisstopo (Walter et al., 2020). Release area and initial mass distribution of the event can be obtained from the height difference map of the two DEMs. As the topographic input, we use a merged DEM based on the pre-event and post-event DEMs. The merged DEM reflects the post-event topography in the release area and pre-event topography in other areas. In addition, we use the same release area as the 2017 landslide event, as shown in figure 1. The grid size of the computational mesh for the simulator is set to be 10 m.
It should be noted, that the intention of the case study is not to backanalyze the 2017 landslide event. Other publications are devoted to that research question (Mergili et al., 2020;Walter et al., 2020). Our focus is to apply the novel emulator-based global sensitivity analysis to the Bondo event in order to assess the model's sensitivity to flow resistance parameters µ and ξ, as well as the release volume v 0 (see section 2.1).  Based on the reference studies, we set the ranges 0.02-0.3 and 100-2200 m/s 2 for µ and ξ respectively. As regards to the release volume v 0 , we assume it varies between 1.5 Mio m 3 and 4.5 Mio m 3 , namely ±50% based on the 3 Mio m 3 release volume of the 2017 landslide event. This is achieved by multiplying the distribution of initial mass of the 2017 landslide event with a value between 0.5 and 1.5. To sum up, the three uncertain inputs result in a three dimensional input space, where µ, ξ, and v 0 vary independently within 0.02-0.3, 100-2200 m/s 2 , and 1.5-4.5 Mio m 3 .

Emulator design and validation
To prepare the emulator training data, N sim = 200 samples are drawn from the three dimensional input space using the maximin Latin hypercube design which maximises the minimum distance between design points to achieve optimum space-filling properties (Aleksankina et al., 2019) ..,200 . One run-out simulation takes 32 minutes on average on a laptop with Intel Core i7-9750H CPU. For each simulation run, we extract the apparent friction angle and impact area, as well as (h max l 1 ,. . . ,h max l k ) T and ( u l 1 max ,. . . , u l k max ) T at k = 47958 chosen locations. This corresponds to the two aggregated scalar outputs and the two vector outputs in section 2.1. At each of the 47958 locations, at least one of the 200 simulation runs has a maximum flow height value larger than 0.1 m. Correspondingly, two scalar GP emulators (section 2.3.1) and two parallel partial GP emulators (section 2.3.2) are built based on x D and its respective simulation outputs. Each parallel partial GP emulator takes about 0.05 seconds to determine maximum flow height or velocity at all 47958 locations for a new input configuration.
Before using the emulators for our further sensitivity analysis, we validate their performance. The proportion of validation outputs that lie in emulatorbased 95% credible intervals is chosen as the diagnostic, denoted as P CI(95%) . This is commonly used in the literature (e.g. Lee et al., 2011;Spiller et al., 2014;Bounceur et al., 2015;Gu and Berger, 2016). It is defined as where n is the number of input configurations for validation, f (x * i ) and f (x * i ) CI(95%) denote the simulation output and the CI(95%) of the emulator prediction at the input x * i respectively. P CI(95%) would be close to 0.95 for an ideal emulator.
The two scalar emulators are validated using the leave-one-out cross validation method as implemented in the RobustGaSP package (meaning n = 200), see figure 3. Both emulators perform well with emulator prediction values being close to simulator outputs and P CI(95%) close to 0.95. As no cross validation scheme is implemented in the RobustGaSP package for a parallel partial GP emulator, we validate the two parallel partial GP emulators for . . , u l k max ) T with k = 47958, using 20 validation runs based on an independent maximin Latin hypercube design. In each panel, the colormap shows the P CI(95%) values at each location; the box plot presents the distribution of P CI(95%) values. In the box plot, the whiskers denote the 2.5th and 97.5th percentiles; the blue dashed line denotes the mean; the number of outliers for each outlier value is noted due to overlapping. The mean of P CI(95%) over all locations for maximum flow height/velocity is 0.93/0.94.
(h max l 1 ,. . . ,h max l k ) T and ( u l 1 max ,. . . , u l k max ) T using additional 20 simulation runs based on an independent maximin Latin hypercube design, see figure 2. Figure 4 (a) shows P CI(95%) values at each location and their distribution in the form of a box plot based on the maximum flow height emulator. Figure 4 (b) shows the same evaluation based on the maximum flow velocity emulator. The lowest P CI(95%) value of the maximum flow height/velocity emulator is 0.6/0.65, and 95% of the P CI(95%) values of both emulators are within 0.8-1. Both emulators show good performance with mean values of P CI(95%) over all locations being 0.93 and 0.94 respectively.

Preliminary convergence analysis
The base sample size N , realization sample size N r , and bootstrap sample size N b need to be determined before using the validated emulators for the Sobol' sensitivity analysis (see Algorithm 1). Here, we present the results of a convergence analysis based on the validated emulator for the apparent friction angle in order to determine values for these sample sizes. Figure 5 shows how the estimated Sobol' indices and their CI(95%) values change with N increasing from 200 to 10000 with a step size 200, while keeping N r = N b = 50. It can be seen that the estimated Sobol' indices tend to converge when N is large than 4000, and their CI(95%) lengths almost do not decrease for N ≥ 6000. We conducted the same analysis with N r = N b = 100 and N r = N b = 200. The results are similar to our findings with N r = N b = 50, indicating little impact of N r and N b . Therefore, we set N = 6000 and N r = N b = 50 for the following sensitivity study. It leads to N · (p + 2) = 6000 · (3 + 2) = 30000 samples from the three dimensional input space to estimate the Sobol' indices, namely ..,30000 . Among them, 2 · N = 12000 samples are used to estimate the overall variance term V (y) in eqs. (2a)-(2b), see section 2.2. Figure 5: First-order (first row) and total-effect Sobol' indices (second row) based on the GP emulator for the apparent friction angle, with N r = N b = 50 and N increasing from 200 to 10000 with a step size 200. In each panel, the dashed line and solid line show the change of the estimated Sobol' index and its 95% credible interval respectively (see step 14 in Algorithm 1); the estimated Sobol' index tends to converge for N ≥ 4000 and the length of its 95% credible interval hardly decreases for N ≥ 6000.

Apparent friction angle and impact area
The box plot in figure 6(a) shows the distribution of emulator-predicted apparent friction angle values corresponding to the 12000 samples used to estimate the variance of the apparent friction angle (see section 4.4). Due to input uncertainties, the apparent friction angle could vary in a wide range, around 11.8 • -25.7 • . The mean is 17.9 • . The standard deviation is 3.1 • which corresponds to the square root value of V (y) in eqs. (2a)-(2b). The bar plots in figure 6(a) display the estimated first-order and total-effect Sobol' indices, with CI(95%) denoting the Monte Carlo-based sampling uncertainty and emulator uncertainty. Each pair of bar plots corresponds to the first-order and total-effect Sobol' indices of one input variable. It is evident that the apparent friction angle is dominated by the dry-Coulomb friction coefficient µ of which the first-order index is over 0.9, whereas both the turbulent friction coefficient ξ and the release volume v 0 show little influence on the apparent friction angle, with both first-order indices being smaller than 0.05. This result is expected since µ governs the slope angle on which flow mass begins to deposit (McDougall, 2017), and it is consistent with the results based on one-at-a-time sensitivity analysis methods (e.g., Schraml et al., 2015;Frey et al., 2016). Furthermore, it is noteworthy that the difference between the first-order and total-effect indices is small, indicating weak interactions among the three input variables regarding the apparent friction angle.
Similarly, the box plot in figure 6(b) shows the distribution of emulatorpredicted impact area values. Owing to input uncertainties, the impact area could vary between 1.5-4.5 Mio m 2 with a standard deviation 0.6 Mio m 2 . From the bar plots, it can be seen that estimated first-order indices of µ, ξ, and v 0 are around 0.67, 0.15, 0.18 respectively. It indicates that µ contributes the most to the variance of the impact area, followed by v 0 and ξ. Similar to the results on the apparent friction angle, the small difference between the first-order and total-effect indices implies that the three input variables barely interact with each other concerning the impact area. Compared to the results of the apparent friction angle, the importance of µ on the impact area decreases and that of ξ and v 0 increases. A plausible explanation is that the apparent friction angle only depends on the deposit (assuming that the release area remains the same) where µ plays the dominant role, whereas the impact area depends on all inundated region where all three input variables may have impact.

Maximum flow height and velocity
Before discussing global sensitivity analysis results on maximum flow height and velocity, we summarize the statistics that are needed to interpret the results. Figures 7(a) figure 1. Location A sits near the release area, where the slope is steep. From location B to location D is the Bondasca valley. Location C corresponds to the mean location of 12000 apparent friction angle values (17.9 • ), denoting the average run-out distance. From location D to location E is the debris flow retention basin (Walter et al., 2020). Location F is near the west boundary of the DEM.
It can be seen from figures 7(a) and (d) that in general, the mean of maximum flow height gradually decreases along the flow path whereas the mean of maximum flow velocity first increases then decreases reflecting the acceleration and deceleration process. Along the path cross section direction, both the mean of maximum flow height and that of maximum flow velocity generally decrease from the center to the sides. In addition, the mean values in the upstream area of location B are on average much larger than the mean  6). The CI(95%) is therefore omitted here to avoid redundance. In addition, values smaller than 0.1 are not shown in the colormaps to highlight the trends that we will shortly discuss.
Figures 8(a)-(c) show the first-order contributions of µ, ξ, and v 0 to the variation of maximum flow height at each location. The mean values ofŜ µ , S ξ , andŜ v 0 over the 47958 locations are 0.3, 0.17, and 0.27 respectively. A closer look shows that the dry-Coulomb friction coefficient µ dominates in the downstream area beyond location B, whereas its impact in the upstream area of location B is limited; the turbulent friction coefficient ξ is an influential factor in the upstream area of location B especially in areas around the major flow path, whereas it has negligible impact in the downstream area of location B; the release volume v 0 contributes the most in areas surrounding the release zone and has significant impact in areas near the minor flow path as well as areas surrounding location B, whereas it shows little influence in the downstream area similar as ξ.
Figures 8(d)-(f) present the first-order contributions of µ, ξ, and v 0 to the variation of maximum flow velocity at each location. The mean values of S µ ,Ŝ ξ , andŜ v 0 over all the locations are 0.34, 0.31, and 0.11 respectively. A closer inspection shows that the variation of maximum flow velocity in the downstream area beyond location B is predominantly driven by µ, while it has mild impact in the upstream area; ξ contributes the most to the variation of maximum flow velocity in the upstream area of location B, where the mean values of maximum flow velocity are large (comparing figure 8(e) with figure 7(d)); v 0 only has mild impact in areas near the release zone and near the minor flow path.
Comparing figures 8(a)-(c) with figures 8(d)-(f), we find the first-order contribution of µ to the variation of maximum flow height only slightly differs from its contribution to the variation of maximum flow velocity, with the mean over all locations increasing from 0.3 to 0.34; ξ has more impact on maximum flow velocity than on maximum flow height, with a difference 0.14 on average; the influence of v 0 on maximum flow height is more important than its influence on maximum flow velocity, with a difference 0.16 on average. The dominant role of µ in the downstream area agrees with the finding in section 5.1 that µ predominantly affects the apparent friction angle. The importance of ξ in the upstream area with large mean values of maximum flow velocity is in accord with expectation since the turbulent friction term in eq. (1) is proportional to the square of flow velocity.
Figures 9(a)-(c) show the difference between total-effect and first-order Sobol' indices for maximum flow height at each location, which indicates the interactions between different input variables. TakingŜ T µ −Ŝ µ as an example, it accounts for all high-order effects related to µ, including the second-order interaction between µ and ξ, the second-order interaction between µ and v 0 , as well as the third-order interaction among µ, ξ, and v 0 . The mean values of S T µ −Ŝ µ ,Ŝ T ξ −Ŝ ξ , andŜ T v 0 −Ŝ v 0 over all locations are 0.22, 0.21, and 0.16 respectively. The areas showing significant difference coincide with the areas with low mean values, low standard deviation values, and high coefficient of variation values (see figure 7(a)-(c)), except the area around the major flow path between location A and location B. The difference betweenŜ T v 0 and S v 0 in this area is negligible, meaning that all high-order effects related to v 0 in this area are negligible. The difference in this area shown in figures 9(a) and (b) is therefore mainly due to the interaction between µ and ξ. From Figure 9: Difference between total-effect and first-order Sobol' indices for (h max l1 ,. . . ,h max l k ) T (left column) and for ( u l1 max ,. . . , u l k max ) T (right column). In each panel, values smaller than 0.1 are not shown in the colormap; the scatter plot shows the difference versus the standard deviation shown in figure 7(b) and (e), where difference values larger than 0.1 are plotted using the same colorbar as that used for the colormap, and difference values smaller than 0.1 are plotted in black; the mean over all locations is notated in the scatter plot.
the scatter plots of respective difference versus the standard deviation, it is evident that the interactions generally decrease with increasing standard deviation.
Figures 9(d)-(f) show the difference between total-effect and first-order Sobol' indices for maximum flow velocity at each location. The mean values ofŜ T µ −Ŝ µ ,Ŝ T ξ −Ŝ ξ , andŜ T v 0 −Ŝ v 0 over all locations are 0.21, 0.2, and 0.15 respectively. Similar to the results on maximum flow height, the areas showing significant difference greatly resemble the areas with low mean values, low standard deviation values, and high coefficient of variation values of maximum flow velocity, see figures 7(d)-(f). Again the area around the major flow path between location A and location B is an exception. It can be clearly seen from the scatter plots of respective difference versus the standard deviation, that the interactions generally decrease with increasing standard deviation.
Comparing figures 9(a)-(c) with figures 9(d)-(f), we find that for both maximum flow height and maximum flow velocity, most of the significant interactions occur on the margins of the flow paths where mean values and standard deviation values are relatively small, whereas values of coefficient of variation are relatively large (see figure 7); the interactions generally decrease with increasing standard deviation; there are stronger interactions between the two friction coefficients µ and ξ than between the release volume v 0 and each friction coefficient.

Conclusions
In this study, we have presented a computationally efficient approach which enables variance-based global sensitivity analyses of computationally demanding landslide run-out models. The methodology couples the novel open-source mass flow simulation tool r.avaflow (Mergili et al., 2017), robust Gaussian process emulation for multi-output models (Gu and Berger, 2016;Gu et al., 2018Gu et al., , 2019, and a recent algorithm addressing the emulator uncertainty (Le Gratiet et al., 2014). Based on the 2017 Bondo landslide event, we have employed the approach to study the global sensitivity of selected run-out model outputs to three input variables, namely the release volume and the two friction coefficients. Our main findings are as follows.
• The proposed approach can be successfully used to study the relative importance and interactions of input variables in landslide run-out models, when the trained Gaussian process emulators are validated and the base sample size of a Sobol' analysis is properly chosen. • The first-order effects of each input variable are broadly in line with results of common one-at-a-time sensitivity analyses in the literature. The dry-Coulomb friction coefficient dominates the apparent friction angle, as well as maximum flow height and velocity in the downstream area. The turbulent friction coefficient contributes the most to the variation of maximum flow velocity in the area where maximum flow velocity values are expected to be large. The release volume is found to have significant impact on maximum flow height in the area surrounding the release zone whereas it shows little impact on maximum flow velocity. • Interactions between the input variables could be analyzed for the full flow path, which cannot be assessed by commonly used one-at-a-time approaches. Significant interactions between the input variables generally happen on the margins of the flow path. The mean values and standard deviation values of maximum flow height and velocity are small in those areas. The interactions generally decrease with increasing variation of maximum flow height and velocity. Furthermore, there are stronger interactions between the two friction coefficients than between the release volume and each friction coefficient.
The proposed methodology can be easily extended for variance-based global sensitivity analysis on landslide run-out models employing other basal rheologies, or potentially on any computationally demanding models, when the assumption of Gaussian process emulation is fulfilled as stated in section 2.3.
In addition, other computationally expensive tasks can also benefit from the significant speed-up owing to emulation techniques. While the run-out simulation takes 32 minutes on average to determine maximum flow height at the 47958 locations for a given parameter setting, this time reduces to 0.05 seconds for evaluating the emulator. Hence, whenever an application requires a large number of model evaluations, like uncertainty quantification and model calibration of landslide run-out models, computational costs for training the emulator will be compensated. In our study, this threshold is determined by the 200 training simulation runs, around 107 hours. The emulation techniques likewise have a great potential whenever a splitting between off-line computation (e.g. emulator training) and on-line computation (e.g. urgent computing for early warning systems) is feasible.