Sequential Bayesian optimal experimental design for structural reliability analysis

Structural reliability analysis is concerned with estimation of the probability of a critical event taking place, described by P(g(X)≤0)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(g(\mathbf{X} ) \le 0)$$\end{document} for some n-dimensional random variable X\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{X} $$\end{document} and some real-valued function g. In many applications the function g is practically unknown, as function evaluation involves time consuming numerical simulation or some other form of experiment that is expensive to perform. The problem we address in this paper is how to optimally design experiments, in a Bayesian decision theoretic fashion, when the goal is to estimate the probability P(g(X)≤0)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(g(\mathbf{X} ) \le 0)$$\end{document} using a minimal amount of resources. As opposed to existing methods that have been proposed for this purpose, we consider a general structural reliability model given in hierarchical form. We therefore introduce a general formulation of the experimental design problem, where we distinguish between the uncertainty related to the random variable X\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{X} $$\end{document} and any additional epistemic uncertainty that we want to reduce through experimentation. The effectiveness of a design strategy is evaluated through a measure of residual uncertainty, and efficient approximation of this quantity is crucial if we want to apply algorithms that search for an optimal strategy. The method we propose is based on importance sampling combined with the unscented transform for epistemic uncertainty propagation. We implement this for the myopic (one-step look ahead) alternative, and demonstrate the effectiveness through a series of numerical experiments.


Introduction
In order to ensure sufficient reliability of engineered systems, such as buildings, ships, offshore structures, aircraft or technological products, uncertainties with respect to the system's capabilities and the system's environment must be accounted for.In probabilistic structural reliability analysis, this is achieved through a probabilistic model of the system and its environment.A primary objective with such a model is to estimate the probability that the system will fail (e.g.collapse, sink, crash or explode). 1 probabilistic structural reliability model is commonly defined through a performance function (also called a limitstate function) g(X) depending on some random variable X.Here, g(X) < 0 corresponds to system failure, and g(X) ≥ 0 corresponds to the system functioning.Typically, X contains the parameters describing a particular structure, such as the geometry, dimensions and material properties.These quantities may be random, but can be influenced by the designer of the structure.For example, the designer may choose to use a more expensive, but more durable material in order to improve the structural properties of the system.In addition, X contains the (random) parameters that characterize the systems environment, such as wind speed, wave height etc., and parameters describing how well the model fits reality (model uncertainties).Given X and the function g(•), the probability of failure is defined as the probability P(g(X) < 0).Modern engineering requirements for safe design and operation of such systems are usually given as an upper bound on this probability (Madsen et al., 2006).
Hence, for many practical applications, the failure probability computation is an important task.This is often challenging for complex systems, as a computationally feasible stochastic model of the complete system and its environment is not available.In our modelling framework, this comes in the form of additional epistemic uncertainties, i.e. uncertainties due to limited data or knowledge that in principle can be reduced by gathering more information.These epistemic uncertainties usually come in one of the two forms: 1.The function g(•) or the distribution of X depends on parameters that we do not know the value of.2. Evaluating g(x) at some single realization x of X is expensive in terms of money and/or time.
The last part comes from the complex physical nature of failure mechanisms, where experiments are needed to evaluate the function g(x).This includes numerical computer simulations and physical experiments in a laboratory, which are both time consuming and expensive.Hence, due to the limited number of experiments that can be performed in practice, any method for estimating P(g(X) < 0) that relies on a large number of evaluations of g(•) is practically infeasible.This problem is usually solved by replacing the performance function g(•) with a computationally cheap surrogate model or emulator2 , constructed from a small set of experiments.
When the surrogate model is a stochastic process (viewed as a distribution over functions), we can quantify the added epistemic uncertainty that comes from this simplification.We will assume that epistemic uncertainty is introduced to a structural reliability model, and that there is a way to reduce this uncertainty by performing experiments.The problem we address in this paper, is how to optimally estimate P(g(X) < 0) using as little resources as possible.In particular, we want to find an optimal strategy for the scenario where we can perform experiments sequentially, i.e.where each experiment may depend on the preceding ones.The scenario where g(•) is replaced by a surrogate model created from a finite set of observations {g(x i )} n i=1 has already been studied extensively (Bect et al., 2012;Echard et al., 2011;Bichon et al., 2008;Sun et al., 2017;Jian et al., 2017;Perrin, 2016;Schueremans and Gemert, 2005).The most common approach is to approximate g(•) using a Gaussian process, and make use of the convenient fact that a surrogate model given by the posterior predictive distribution of the Gaussian process has a closed form solution.However, structural reliability models are often hierarchical, and the reason why g(•) is expensive comes from one or more expensive sub-components3 .An example is shown in Figure 1, where g(x) = g(y 1 (x), y 2 (x)).Assume here that x ∈ R m , then the index set of the Gaussian process approximation of g(x) is m-dimensional.Naturally, the number of experiments needed is highly dependent on m.If g(x) is expensive, then this must be because one (or more) of the functions, y 1 (x), y 2 (x) or g(y 1 , y 2 ) is expensive.Very often, the effective domains4 of these functions have dimensionality much smaller than m, so fitting a Gaussian process to observations of g(x) is not very efficient.There is also some practical inconvenience here, which is that some of the expensive subcomponents (for instance load models) may be applicable in different structural reliability models, so there is a potential for re-use if we create a surrogate model for, say y 1 (x), instead of g(x).In this paper we will work with hierarchical models as the one illustrated in Figure 1, where we assume that some of the intermediate variables are stochastic processes with epistemic (potentially reducible) uncertainty.Note that this also covers case where we just introduce additional epistemic variables into the model.Actually, in the approximate numerical solution we propose in this paper, these two problems become equivalent.Moreover, as Gaussianity generally is lost in the hierarchical setting, we will only make assumptions on existence of second order moments of the stochastic processes used as surrogates.We will present a general formulation of the problem of finding an optimal strategy for performing experiments based on Bellman's principle of optimality, and discuss some alternative routes for solving such problems.For the myopic (one step look-ahead) strategy, we propose an efficient numerical pro-cedure, based on finite dimensional approximation of the stochastic processes and uncertainty propagation using the unscented transform.
The structure of the remaining part of the paper is as follows: Through Section 2 and Section 3 we develop the Bayesian optimal experimental design problem for a general structural reliability model.We introduce a framework for separation of aleatory and epistemic uncertainties using conditional expectations, from which we can express any type of experiment associated with a structural reliability problem.For the purpose of estimating a failure probability, we consider three alternative optimization objectives, and in Section 3 we discuss how the experimental design problem may be tackled using dynamic programming and the myopic approximation.Optimization problems of this form will involve evaluation of a measure of residual uncertainty, and in Section 4 we present an approach for approximating this quantity.We implement this in Section 5 to develop an efficient numerical procedure for myopic scenario, which we illustrate through a series of examples in Section 6.Finally, our concluding remarks are given in Section 7, and some supporting material used throughout the paper is included in the Appendices.

Problem formulation
Given a probabilistic surrogate of a structural reliability model, we are interested in how to optimally improve the model for failure probability estimation, given a fixed experimental budget.More generally, given a structural reliability model with epistemic uncertainty (e.g. as introduced when using a surrogate), and a set of possible experiments than can be performed, we want to select the experiments in an optimal manner.The choice of experiment is called a decision, d ∈ D where D is a space of feasible decisions.Note that this set may include different kinds of decisions, such as performing computer experiments, lab experiments or performing physical measurements in the field.
In the following subsections we present a rigorous formulation of the Bayesian optimal experimental design problem for structural reliability analysis.Here we will need a way to express uncertainty about the performance function used in structural reliability models, and a way to model uncertainty about future outcomes of potential experiments that can be made.For this purpose we will define a model (ξ , δ ), where ξ is a stochastic representation of the performance function g(x) evaluated at some fixed input x. δ (d) is a predictive model of experimental outcomes given a decision d.In other words, δ models the data generating process of potential experiments.
We will consistently write X as a random variable with values in R m , and let x be a deterministic realization.ξ and δ are stochastic processes, indexed over inputs x and decisions d respectively.In structural reliability analysis, we are interested in the random variable g(X), and likewise we will consider ξ (X), but now where ξ (x) is also random for any fixed x.As the purpose of performing experiments will be to provide information about ξ , note that ξ and δ are generally not independent.
A detailed description of how (ξ , δ ) is constructed is provided in the following subsections.

Structural reliability analysis
Let X ⊆ R m , and let X be a random variable on the probability space (Ω , F , P) with values in X and g : X → R a measurable function.We call g the performance function or limit state, with the associated failure set In structural reliability analysis, we are interested in estimating the failure probability, which we here denote ᾱ.It is defined as where E [•] denotes the expectation with respect to P and 1 (•) is the indicator function.
In most real-world cases it is difficult to derive an analytical expression for the failure probability.To overcome this, several approximation and simulation methods have been suggested, see e.g.Madsen et al. (2006) or Huang et al. (2017).Two traditional methods are the first-and secondorder reliability method (FORM/SORM), where the failure boundary is approximated at a specific point using a Taylor expansion up to the first and second order respectively.Different sampling procedures have also been developed, which often make use of intermediate results obtained from FORM/SORM.Other relevant techniques involve the construction of environmental contours and the estimation of buffered failure probabilities as in (Dahl and Huseby, 2019).In this paper, our focus is different from these methods in the sense that we are mainly interested in how to estimate the failure probability as well as possible, given a limited experimental budget.To do so, we need to separate between different kinds of uncertainty in our model.

Separating epistemic and aleatory uncertainties
Ideally, the uncertainty related to the random variable g(X) in ( 1) is aleatory, in the sense that that it relates to inherent variability of the physical phenomenon that is being modelled, but in reality we must also include epistemic uncertainty due to lack of information or knowledge.For instance, assume that g(x, e) depends on the aleatory variable x and some fixed but unknown parameter e. Assume further that X is the aleatory random variable representing variability in x, E is the epistemic random variable representing our belief about e, and that X and E are independent with laws P x and P e .It is then relevant to view the failure probability as a random quantity with epistemic uncertainty, α(E) = 1 (g(x, E) ≤ 0) P x (dx).For engineering applications, one would then typically be interested in some specified upper percentile values of α(E), i.e. ensuring that the epistemic uncertainty is under control.
In the following, we will assume that we have a performance function ξ (•) that depends on a strictly aleatory random variable X, and some other random quantity with epistemic uncertainty.We will need to formulate this with a bit of generality, in order to cover the different ways epistemic uncertainty can be introduced in a structural reliability model.
As in Section 2.1 we will work with (Ω , F , P) as the global probability space, capturing all forms of uncertainty.We then let A and E be two sub σ -algebras representing respectively aleatory and epistemic information, and we assume that X is A -measurable.Furthermore, for any X ∈ X we assume that ξ (X) is E -measurable.That is, ξ : X × Ω → R is a stochastic process indexed by X ∈ X (this is also called a random field), and ξ (X) is a real-valued random variable.We will write ξ (•) instead of g(•) whenever epistemic uncertainty has been introduced, as for instance in the canonical case where a deterministic performance function g(•) is approximated with a probabilistic surrogate ξ (•).
We can now define the failure probability with epistemic uncertainty as the E -measurable random variable (2) Note that (2) coincides with (1) in the case where the performance function is not affected by epistemic uncertainty, and in general as ᾱ(ξ where the second equality uses the double expectation property.
In the following we will just write α or ᾱ without the dependency on ξ when there is no risk of confusion.
Example 1 Assume ξ is a deterministic function of the aleatory random variable X and epistemic random variable E, both defined on (Ω , F , P).Then A = σ (X) and E = σ (E), i.e., the σ -algebras generated by the random variables X and E respectively.
Note that the converse of Example 1 also holds true, as we can always view ξ as a deterministic function applied to two random variables X and E. That is, where ξ (x, e) is a deterministic function for x and e fixed, and we can write the stochastic process ξ (x, ω) as ξ (x, E).It is sometimes useful to think of ξ in this way.In particular, the numerical approximation we propose later in this paper is based on obtaining a finite dimensional approximation of E.
Example 2 Let g be given as in the hierarchical model in Figure 1, and X a random variable defined on some measure space (Ω x , F x , P x ).Assume that y 1 and y 2 are expensive to evaluate, so we replace them with surrogate models in the form of two stochastic processes y 1 and y 2 defined on another measure space (Ω y , F y , P y ).Note that we assume that both y 1 and y 2 are defined on the same measure space.Then, the measure space for the experimental design problem is given by (Ω , F , P) = (Ω x × Ω y , F x ⊗ F y , P x × P y ), A = F x and E = F y (up to isomorphism), and we would write ξ (x) = g( y 1 (x), y 2 (x)).

Decisions, outcomes and experiments
We are interested in the case where the epistemic uncertainty in α can be reduced by running experiments.For instance, in Example 1 the epistemic variable E could be a fixed but unknown parameter, and maybe additional measurements could be performed to reduce the uncertainty in E. Or in Example 2, additional experiments could be performed to infer the values of y 1 or y 2 at some given input x , in order to reduce uncertainty in the surrogate models y 1 and y 2 .
These are examples of possible decisions we could make to reduce epistemic uncertainty.We will let D denote the set of all possible decisions, and O the set of all possible outcomes.For any decision d ∈ D, the corresponding outcome is uncertain a priori, and in order to evaluate the potential impact of a decision we will need to specify (possibly subjectively) a distribution representing the possible outcomes.We will let δ (d) denote the random outcome of a decision d ∈ D with values in O.For any realization o ∈ O of δ (d), we will refer to the pair (d, o) as an experiment.
In our modelling framework, we will assume that ξ (x) as defined in Section 2.2 is provided together with (Ω , F , P) and the sub σ -algebras A and E , and that a decision process δ (d) is given where δ (d) is E -measurable for any d ∈ D. Table 1 gives an overview of the notation we have introduced so far, in order to define the problem of optimal experimental design for structural reliability analysis.
Example 3 Continuing from Example 2, assume that noise perturbed observations of y 1 can be made.Let d(x) = {observe y 1 (x)}, and define D as the union of such events for all x.If we assume that observations come with additive noise, o(x) = y 1 (x) + ε(x), for some specified noise process The model ε, then we can let δ (d(x)) = y 1 (x) + ε(x).In a similar fashion, D and δ (d) could be extended to include observations of y 2 as well.
We will note that the noise-free alternative to Example 3, i.e. the case where ε ≡ 0, is a common scenario when dealing with deterministic computer simulations.Another related scenario that is also of relevance here, is that of muiltifidelity modelling (Fernandez et al., 2017), in which case inaccurate estimates of y 1 (x) could be available at the same time, but at a lower cost.

Sequential model updating
Now, having defined a random variable X and the two processes {ξ (x)} x∈X and {δ (d)} d∈D , we want to perform a sequence of experiments, (d 0 , o 0 ), (d 1 , o 1 ), . . ., and update ξ and δ accordingly.We let I k := {(d 0 , o 0 ), . . ., (d k−1 , o k−1 )} denote the information or history up to the k-th experiment, and define E k as the σ -algebra generated by E and I k .Hence, E k is all the information regarding epistemic quantities that is available after k experiments.We introduce the notation P k (•) and given the updated information E k .For convenience we define I 0 = / 0, so that we can use the index k = 0 with these definitions for the scenario before any experiment has been made.We will write ξ k and δ k as the updated processes ξ |I k and δ |I k corresponding to P k .Per definition, In the following example, we show how this sequential update can be done via Bayes' theorem.
Example 5 For a specific problem there will typically be simpler ways of updating the model than the generic formulation given in the previous example.Continuing again from Example 2 and Example 3, assume δ (d) = δ (x, y 1 , y 2 ) corresponds to observing y 1 (x) + ε 1 (x) or y 2 (x) + ε 2 (x).
Then y 1 and y 2 can be updated directly, and we let ξ In fact, if y 1 and y 2 and the noise terms ε 1 and ε 2 are all Gaussian processes, then y 1 |I k and y 2 |I k are also Gaussian and closed form representations are available (see Appendix A).Note that in this case the model update could include updating the Gaussian process hyperparameters as well.

Optimization objective
Following the formulation of Bect et al. (2012Bect et al. ( , 2019)), a strategy for uncertainty reduction starts with a measure of residual uncertainty for the quantity of interest after k experiments.This is a functional of the conditional distribution P k .In this paper we will consider three specific alternatives for H k .
Assume k experiments have been performed, resulting in the updated probabilistic model (ξ k , δ k ).The updated failure probability according to (2) can then be defined as As we are interested in reducing uncertainty in α, a natural optimization objective is to minimize Var(α k ) = E (α k − ᾱk ) 2 .However, computation of Var(α k ) can be problematic in practice.Most of the proposed methods for design of experiments in (non-hierarchical) structural reliability models therefore make use of alternative heuristic optimization objectives.That is, some alternative function H k (•) that is easier to compute than Var(α k ), and where the design that minimizes H k (•) hopefully also performs well with respect to Var(α k ).Bect et al. (2012) present a few such criteria, some of which will also be considered in this paper.Let and also that γ k (x)/2 is the probability that two i.i.d.samples from ξ k (x) have the same sign.Hence, γ k provides a measure of how accurate ξ k (x) is around the critical value ξ k = 0. We will introduce two measures of residual uncertainty based on taking the expectation of γ k with respect the distribution of X, which we denote P X .In total, we will consider the following three alternatives for H k : Here H 2,k and H 3,k can also be motivated by realizing that they serve as upper bounds on H 1,k .In fact, H 1,k ≤ H 3,k ≤ H 2,k (see Proposition 3 in (Bect et al., 2012)).
For optimal design of experiments we will consider loss functions given by the above measures of residual uncertainty, potentially in combination with an additional penalty term that represents the cost of performing a given experiment.In the Bayesian decision-theoretic framework, given such a loss function depending on a policy for selecting experiments π, we can evaluate the policy by looking nsteps ahead.For instance, a relevant loss function for minimizing uncertainty in α after n additional experiments, following after the current experiment k, could be given as J k (π) = E k H 1,k+n where E k+n corresponds to following the policy π.The additional notation introduced with respect to the measure of residual uncertainty and sequential model updating is summarized in Table 2.

Modelling information and experimental design
In this section, we introduce the experimental design framework and explain how the development of information is modelled in this context.In the following, let k = 0, 1, . . ., K − 1 be the experiment index which keeps track of the number of performed experiments.
3.1 The dynamic programming formulation Huan and Marzouk (2016) introduce a general framework for sequential optimal experimental design: Let the state5 of the system after experiment k −1 be denoted by s k .The input (decided by the experimental designers) to experiment k is denoted by d k .We want to determine a policy where That is, given the current state of the system, the policy is a function which tells the experimental designer the input to the next experiment.
From each experiment, we get observations o k .These observations may include measurement noise and modelling errors.Associated to each experiment, we have a stage reward R k (s k , o k , d k ).The stage reward reflects the cost of doing the experiment (measured in e.g. money or time) plus any additional benefits or penalties of doing the experiment (measured in the same unit).Furthermore, we have a terminal reward R K (s K ) only depending on the final state of the system.
In order to model the development of the system of experiments, we have the system dynamics: where F (•) is some function specifying the transition from a current state to a new state based on the performed experiment.k denotes the physical state that describes any additional deterministic decisionrelevant variables.Herein we will not write s k specifically in this form.The optimal experimental design problem can then be formulated as follows: and the maximization is done over all policies π that do not look into the future.That is, when deciding policy π k , only what is know up to experiment k − 1 can be used.Another way of saying this is that the policy π should be adapted to the filtration generated by the processes {s k }, {o k } and {d k }.Note that the optimization problem is over the whole experimental design period and is based on that there is an initial number K of experiments that are to be performed.The experimental design policy is chosen in order to maximize the total reward of all of the experiments, as opposed to simply doing what is optimal in the next experiment (without taking the future experiments into account).
To adapt this framework to the experimental design problem for structural reliability analysis, we write and where the dynamics Remark 1 Note that the expectation in ( 10) is with respect to future outcomes o 0 , . . ., o K−1 which a priori are uncertain, and where each outcome o k depends on the previous outcomes o 0 , . . ., o k−1 .An equivalent formulation can be given in terms of conditional expectations.Let each reward be defined by backwards induction: where R K = R K (s k ) only depends on the final state of the system.Then, the policy defined by selecting for each k the decision is optimal.This corresponds with the formulation used by Bect et al. (2012).
Problem ( 10) is a dynamic programming problem.Though theoretically optimal, such problems are known for suffering form the so-called curse of dimensionality.That is, the number of possible sequences of design and observation realizations grow exponentially with the dimension of the state space.According to Defourny et al. (2011), the curse of dimensionality implies that dynamic programming can only be solved numerically for state spaces embedded in R d with d ≤ 10.Therefore, such problems can often only be solved approximately via approximate dynamic programming, see Huan and Marzouk (2016).Note also that this type of formulation is based on a Markovianity assumption, i.e., that there is no memory in the dynamics of the system.This assumption is necessary in order to perform the simplification to only having dependency on the current state of the system in Remark 1.If the system is not Markovian, in the sense that the decision at any time depends not only on the current state of the system, but also on some of the previous states of the system, we cannot solve the experimental design problem by backwards induction.The reason for this is that the Bellman equation, which backwards induction is based on, does not hold in this case.In such cases, the experimental design problem can for instance be solved via the maximum principle, see e.g.Dahl et al. (2016) for an example of systems with memory in continuous time.
Remark 2 An alternative solution method to dynamic programming for problem (10) is to use a scenario tree based approach, see Defourny et al. (2011).Scenario tree based approaches are not sensitive to curse of dimensionality based on the state space, but based on the number of experiments.Hence, a scenario based approach can be attempted whenever there are few experiments (less than or equal 10), but potentially a large dimensional state space (greater than 10).If the number of experiments is large (greater than 10), but the state space dimension is small (less than or equal 10), dynamic programming is a viable solution method.If both the state space dimension and the number of experiments is large, one can try approximate dynamic programming (see Huan and Marzouk (2016)) or a myopic formulation as an alternative to the dynamic programming one.In Section 3.2, we consider such a myopic formulation.
Note that problem ( 10) is maximization problem of a reward, but can trivially be transformed to a minimization problem with some loss function L k = −R k instead.For the application considered in this paper, we are interested in minimization problems associated with the residual uncertainty described in Section 2.5.
Example 6 Let λ (d k ) denote the cost of decision d k .A relevant set of loss functions could then be: where η is some discount factor, η ∈ (0, 1), would produce a similar but more greedy policy.Another relevant alternative is to define as the sum of costs up to the iteration k * where some target level, H k < H * for k > k * , has been reached.

The myopic formulation
As mentioned in Section 3.1, the dynamic programming formulation of the optimal experimental design problem for structural reliability analysis suffers from the curse of dimensionality.An approximation to the dynamic programming formulation which mends this problem, is the myopic formulation.This corresponds to truncating the dynamic programming sum in (10) and only looking at one time-step ahead at the time.Due to the truncation, the myopic formulation is not theoretically optimal, but it is computationally feasible even for large systems since it does not suffer from the curse of dimensionality.
In this section, we define the the myopic optimal decision d ∈ D at step k as the minimizer of the following function Here H i,k are the measures of residual uncertainty defined in Section 2.5, and E k,d represents the conditional expectation with respect to represents how desirable decision d is for reducing the expected remaining uncertainty in α at experiment k + 1, if the next experiment is performed with input d.We let λ (d) be a deterministic function representing the cost associated with decision d, and we will refer to a function J i,k (d) as the acquisition function for myopic design.Other ways of introducing additional rewards or penalties associated with an experiment are of course also possible.In fact, there is no particular reason why we write (12) as a product of cost and the measure of residual uncertainty, besides emphasizing that J i,k (d) should be a function of these two terms.Note that this is greedy strategy.At each time step, we choose the input for the next experiment which is optimal given that we only look one step ahead.This is in contrast to the experimental design model in Section 3.1 which chooses the optimal policy based on the whole experimental design phase.The greedy strategy is not theoretically optimal, as it essentially corresponds to truncating the sum in the dynamic programming formulation (10).However, due to this truncation, the greedy approach does not suffer from the curse of dimensionality.Hence, it is more tractable from a computational point of view.
4 Approximating the measure of residual uncertainty Assume k experiments have been performed, resulting in the updated probabilistic model (ξ k , δ k ).A simple method for estimating the measures of residual uncertainty described in Section 2.5, is by a double-loop Monte Carlo simulation: Let . performance functions sampled from ξ k and evaluated at each x i .Then H 1,k can be obtained as the sample variance of the N 2 samples of the form αk, j = 1 Similarly, H 2,k and H 3,k can be estimated from pk i, j .This approach is problematic for several reasons.First of all, αk, j is an unbiased estimator of the failure probability α k, j = α(ξ k, j ) corresponding to the deterministic performance function ξ k, j .When α k, j is small, the variance of this estimator is var( αk, j ) = α k, j (1 − α k, j )/N 1 ≈ α k, j /N 1 .If we want to achieve an accuracy, of say var( αk, j ) < 0.1α k, j , and α k, j = 10 −m , then the number of samples required would be approximately N 1 = 10 m+2 .The failure probabilities considered in structural reliability analysis can typically be in the range from 10 −6 to 10 −2 .
When N 1 is large, it can also be a practical challenge to obtain the samples ξ k, j (x 1 ), . . ., ξ k, j (x N 1 ) simultaneously for a fixed j.Moreover, the total number of samples needed to evaluate the measures of residual uncertainty H i,k is N 1 N 2 , and we are interested in optimization over H i,k that will require multiple simulations of this kind.
In this section we present a procedure for efficient approximation of the measures of residual uncertainty.We will start by introducing a finite dimensional approximation of ξ k (x), given as a deterministic function ξk (x, E) depending on x and a finite dimensional E k -measurable random variable E.Then, in Section 4.2 we consider how the mean and variance, E [ f (E)] and var( f (E)), can be approximated for any E k -measurable function f (e) using the unscented transform.In Section 4.3 and Section 4.4 we present an importance sampling scheme for the case where f (e) is defined in terms of an expectation over X.Finally, in Section 4.5 we consider the case where f (e) = α( ξk (X, e)), which provides the approximations αk = f (E) and Ĥ1,k = var( f (E)), and where approximations of H 2,k and H 3,k are obtained in a similar manner.
In summary, this kind of approximation which we will refer to as UT-MCIS from now on, makes use of the unscented transform (UT) for epistemic uncertainty propagation and Monte Carlo simulation with importance sampling (MCIS) for aleatory uncertainty propagation.The motivation behind this specific setup is that a technique such as MCIS is needed to obtain low variance estimates of α( ξk (X, e)), which will typically be a small number.The sampling scheme we propose is also designed to be efficient in the case where subsequent estimates corresponding to perturbations of α( ξk (X, e)) are needed, which is relevant for estimation of e.g.α( ξk+1 (X, e)) or α( ξk (X, e )) for some e = e if α( ξk (X, e)) has already been estimated.As for epistemic uncertainty propagation, when α( ξk (x, E)) is viewed as an E k -measurable random variable, the UT alternative which is both simpler and more efficient seems like a viable alternative, in particular for the purpose of optimization with respect to future decisions.

The finite-dimensional approximation of ξ k
In our framework, we have defined ξ k as a E k -measurable stochastic process indexed by x ∈ X (often called a random field), and we view ξ k as a distribution over some (generally infinite-dimensional) space of functions.The special case where ξ k = ξ k (x, E) for some finite dimensional E kmeasurable random variable E can be very useful for simulation.That is, if samples e j of E can be generated efficiently, then random functions ξ k, j (x) = ξ k (x, e j ) can be sampled as well.As long as ξ k is square integrable, such a representation of ξ k is always available from the Karhunen-Loéve transform: where the functions φ i are deterministic and E i are uncorrelated random variables with zero mean.The canonical ordering of the terms E i φ i (x) also provides a suitable method for approximating ξ k (x), by truncating the sum at some finite i = M, and we could then let E = (E 1 , . . ., E M ) (see for instance Wang (2008)).
But obtaining the Karhunen-Loéve transform can also be challenging.Because of this, we present an extremely simple approximation, that just relies on computation of the first two moments of ξ k .We let E be a 1-dimensional random variable with E [E] = 0 and E E 2 = 1, and define (13) This is indeed a very crude approximation, as essentially we assume that the values of ξ k at any set of inputs x are fully correlated.But for probabilistic surrogates used in structural reliability models this is actually not that unreasonable, and as it turns out, for the examples we consider in Section 6 it seems sufficient.Remark 3 Note that to update the approximate model ξk (x) in ( 13) given some new experiment (d k , o k ), we only need to update the mean and variance functions.This is in line with the numerically efficient Bayes linear approach (Goldstein and Wooff, 2007), where random variables are specified only through the first two moments, and where the Bayesian updating given some experiment corresponds to computation of an adjusted mean and covariance.An application of the Bayes linear theory to sequential optimal design of experiments can be found in (Jones et al., 2018).We note also that in the case where Gaussian processes are used as surrogate models, the classical and linear Bayesian approaches are computationally equivalent.Moreover, in the following section we will introduce the unscented transform for approximation of the updated/adjusted moments, and as a consequence the complete prior probability specification of E becomes less relevant.
In the case where we are dealing with a hierarchical model, it might not be convenient to compute where The approximation of ξ k is then obtained as ξk (x) = g( Ŷk (x)).The same goes for the scenario with more than two layers in the hierarchy, for instance , where we would approximate both Z k (y) and Y k (x).In any case, we end up with finite dimensional random variable E, and we can define the approximation ξk (x, E).

The unscented transform for epistemic uncertainty propagation
The unscented transform (UT) is a very efficient method for approximating the mean and covariance of a random variable after nonlinear transformation.UT is commonly applied in the context of Kalman filtering, and it is based on the general idea that it is easier to approximate a probability distribution than an arbitrary nonlinear transformation (Uhlmann, 1995;Julier and Uhlmann, 2004).Intuitively, given any finite-dimensional random variable E we may define a set of weighted sigma-points {(v i , e i )}, such that if {(v i , e i )} was considered as a discrete probability distribution, then its mean and covariance would coincide with E.
For any nonlinear transformation Y = f (E), if E was discrete we could compute the mean and covariance of Y exactly.The UT approximation is the result of such computation, where we make use of a small set of weighted points {(v i , e i )}.Specifically, let E be a finite dimensional random variable with mean µ and covariance matrix Σ.A set of sigma-points for E is a set of weighted samples If y = f (e) is any (generally nonlinear) transformation, the UT approximation of the mean and covariance of Y = f (E) are then obtained as where Naturally, the selection of appropriate sigma-points is essential for UT to be successful.It is important to note that, although we may view the sigma-points as weighted samples, v i and e i are fixed or given by some deterministic procedure.Moreover, the definition of sigma-points given in (15) does not require that the weights are nonnegative and sum to one.Although this conflicts with the intuition of approximating E with a discrete random variable, the unscented transform still makes sense as a procedure for approximating statistics after nonlinear transformation.
Since the introduction of UT to Kalman filters in the 1990's, many different alternatives for sigma-point selection have been proposed (Menegaz et al., 2015).These are mostly focus on applications where E follows a multivariate Gaussian distribution, but we do not see this as a restriction since we will assume that E can be represented as a transformation E = T −1 (U ) of a multivariate Gaussian variable U .For the applications considered in this paper, we will let {(v i , u i )} denote a set of sigma-points that are appropriate for the multivariate standard normal U ∼ N (0, I) where dim(U ) = dim(E).If T is the corresponding isoprobabilistic transformation, i.e.T (E) ∼ N (0, I) (see Appendix B.1), we will use {(v i , T −1 (u i ))} as a set of sigmapoints for E. Equivalently, we could also view this as taking the UT approximation of U under a different transformation given by f • T .For the numerical examples we present in this paper, we have made use of the the method developed by Merwe (2004), which produces a set of n = 2•dim(E)+1 points e i with corresponding weights6 .Determining sigmapoints with this procedure is quite straightforward, and the details are given in Appendix C. We note again that for any structural reliability model, as long as we do not change dimensionality of E, determining the sigma-points is a onetime computation, and any subsequent UT approximation of Y = f (E), for some nonlinear transformation f (•), is computationally very efficient.
Remark 4 Note that it is not necessary that the sigma points used in the approximation of the mean and covariance in ( 16) are the same.In fact, the method presented in Appendix C makes use of two different sets of weights for these approximations.As this is not of any relevance for the remaining part of this paper, we will keep writing {v i , e i } as a single set of sigma-points to simplify the notation.

Generating samples in X
In order to estimate the measures of residual uncertainty, we will need a set of samples of X.We will generate a finite set of 3-tuples {(x i , w i , ηi )}, where {(x i , w i )} are weighted samples in X suitable for obtaining importance sampling estimates of failure probabilities, and ηi is a number describing how influential a given sample (x i , w i ) is expected to be in such an estimate.In other words, {x i } should be constructed to 'cover the relevant regions in X', and for estimation we will only make use of a subset of {(x i , w i )}.The relevant subset will be determined from the measure of insignificance | ηi |, where we will only consider samples (x i , w i ) where | ηi | is below some threshold.We start by describing how the weighted samples {(x i , w i )} are generated.

Importance sampling
The general idea behind importance sampling is that if we select some random variable Q ≥ 0 with law P Q , such that E P X [Q] = 1 and Q = 0 P X -almost surely, then for any A -measurable function f (x).This is often useful for estimation, for instance when sampling from P X is difficult, and in the case where we can find a Q such that estimates with respect to the right hand side of ( 17) are better (have lower variance) than estimating E P X [ f (X)] directly.
In the case where X admits a probability density p X , we can let q X be any density function such that q X (x) > 0 whenever p X (x) > 0. Let x 1 , . . ., x N be i.i.d.samples generated according to q X , and define w i = p X (x i )/q X (x i ).The importance sampling estimate of E P X [ f (X)] with respect to the proposal density q X is then obtained as We now assume that the stochastic limit state can be written as ξ k (x, E) for some finite-dimensional random variable E, and for any deterministic performance function ξ k (x, e) we will write α k (e) = α(ξ k (X, e)) as the corresponding failure probability.An importance sampling estimate of α k (e) is then given by ( 18) with f In order to obtain a good estimate of α k (e), we would like the proposal distribution q X to produce samples such that there is an even balance between the samples where ξ k (x, e) ≤ 0 and ξ k (x, e) > 0, where at the same time p X is as large as possible.One way to achieve this is to generate samples in the vicinity of points on the surface ξ k (x, e) = 0 with (locally) maximal density.A point with this property is called a design point7 or most probable failure point in the structural reliability literature.We will let q X represent a mixture of distributions, centered around different design points that are appropriate for different values of e.The full details are given in Appendix B, where we also describe a simpler alternative than can be used in the case where design point searching is difficult or not appropriate.
The measure of insignificance |η i | Assume {(x i , w i )} is a set of samples capable of providing a satisfactory estimate of α k (e), and we now want to estimate α k (e ) for some new value e .If we know that the sign of ξ k (x i , e) and ξ k (x i , e ) will coincide for many of the samples x i , then the estimate of α k (e ) can be obtained more efficiently by not computing all the terms in the sum (19).This is typically the case when e and e are both sampled from E.
It is also true in the case where we want to estimate α k+1 (e ) given some new experiment (d k , o k ), if we assume that updating with respect to (d k , o k ) has local effect (i.e.there are always regions in X where ξ k+1 (x) ≈ ξ k (x)), or if the experiment is carried out to reduce the uncertainty in the level set ξ k = 0 (which is what we intend to do).
In other words, we consider some perturbation of the performance function ξ k (x, e), and we are interested in identifying the samples x i where 1 (ξ k (x i , e) ≤ 0) does not change under the perturbation.For this purpose we define the function and let η i = η(x i , ξ k ) be defined with respect to the relevant process ξ k .Here η i describes how uncertain ξ k (x i ) is around the critical value ξ k = 0, in the sense that if |η i | is small (close to zero) then ξ k (x i ) > 0 and ξ k (x i ) ≤ 0 may both be probable outcomes.Conversely, if |η i | is large then either P(ξ k (x i ) ≤ 0) ≈ 0 or P(ξ k (x i ) ≤ 0) ≈ 1, and the input x i is insignificant as it is unnecessary to keep track of changes in 1 (ξ k (x i ) ≤ 0).We will use η i to prune the sample set {(x i , w i )}, by only considering the samples where |η i | is below a given threshold τ.Although this is an intuitive idea, we may also justify the definition of η and selection of a threshold τ more formally by making use of the following proposition.
Although Proposition 41 holds in general, tighter (and probably more realistic) bounds can be obtained by making assumptions on the form of ξ (x).For instance, in the case where ξ (x) is Gaussian we obtain where Φ(•) is the standard normal CDF.
We will make use of ηi obtained as the UT approximation of η i .That is, ηi is in general obtained from the finitedimensional approximation described in Section 4.1, combined with the UT approximation ( 16) with Y = ξk (x, E).

Importance sampling estimates with pruning
Let {(x i , w i , ηi ) | i ∈ I }, I = {1, . . ., N 0 } be a set of samples generated as described in Section 4.3.Given some fixed threshold τ > 0, we define the subset of pruned samples as the ones corresponding to the index set I τ = {i ∈ I | ηi < τ}, and define Īτ = I \ I τ .If f (x) is some A -measurable function where we know a priori the value of f i = f (x i ) for all i ∈ Īτ , then we can immediately compute h = 1 and the importance sampling estimate of the expectation of f (X) becomes If we let then an unbiased estimate of the sample variance is given as which shows the general idea with this pruning, namely that low variance estimates of E[ f (X)] can be obtained with a small number of evaluations f (x i ), assuming that the subset I τ is small compared to I (and that the assumed values f i are correct).One drawback with this procedure is that we do not have control over the number of pruned samples, which still might be very large.In order to set an upper bound on the number of evaluations f (x i ), we let I n τ ⊆ I τ contain the first n elements of I τ (or some other subset, as long as the elements of {x i | i ∈ I n τ } remain independent).An importance sampling estimate of E[ f (X)] using only samples from I n τ is given as where N τ = |I τ |, and we may estimate the sample variance as Obtaining consistency results is easy under the ideal assumption that n(N 0 − N τ )/N τ is an integer, and the formulas in ( 27)-( 28) comes as a consequence of the following result.
Proposition 42 Assume n(N 0 − N τ )/N τ ∈ N. Then ( 27) is an unbiased estimate of E[ f (X)] and ( 28) is an unbiased estimate of the sample variance.
Proof Let Ī n τ be a set of n(N 0 − N τ )/N τ elements selected uniformly random from Īτ and define samples from the proposal distribution with density q(x).To show consistency we replace each sample x i with i.i.d.random variables X i distributed according to q.We then define μ = μ1 + μ2 where μ1 = 1 and where w(x) = p(x)/q(x), and we can observe that μ As for the variance, we first observe that var( μ) = var( μ1 ) + var( μ2 ) where var( μ1 Replacing var( μ1 ) and var( μ2 ) with unbiased sample variances using the samples and similarly where we have used that h and r are unbiased estimates of E q [ μ1 ] and E q [ μ2 ] respectively.The expression in ( 28) is then obtained as var( μ1 ) + var( μ2 ) using that |I | = N 0 and Using the tools introduced in the preceding subsections, we now present how the measures of residual uncertainty, H 1,k , H 2,k and H 3,k , can be approximated using Monte Carlo simulation with importance sampling (MCIS) combined with the unscented transform (UT) for epistemic uncertainty propagation.
We first let ξk (x, E) be the finite-dimensional approximation introduced in Section 4.1, with the corresponding failure probability αk (E) = α( ξk (x, E)).We then let {(x i , w i , ηi ) | i ∈ I }, I = {1, . . ., N 0 } be a set of samples generated as described in Section 4.3, where ηi is obtained using the UT approximation of ξk (x i , E).We will make use of importance sampling estimates as introduced in Section 4.4, where I τ = {i ∈ I | ηi < τ}, and estimation is based on a small subset {(x i , w i , ηi Approximating H 1,k Let f i = 1 ( ηi ≤ 0) for i ∈ Īτ and compute h1 as in (23).We will let {(v j , e j ) | j = 1, . . ., M} denote the set of sigmapoints as introduced in Section 4.2.
For any fixed e j , the corresponding importance sampling estimate of the failure probability αk (e j ) is obtained as and we let Ĥ1,k be given by the UT approximation Approximating H 2,k and H 3,k Both H 2,k and H 3,k are defined through the function γ k (x), which represents the uncertainty in the sign of ξ k (x).We will approximate γ k (x i ) with the following function where Φ(•) is the standard normal CDF.There are two ways of interpreting this approximation.First of all, γk,i corresponds to the case where ξk (x i , E) is Gaussian, which may or may not be an appropriate assumption.Alternatively, we can think of γ k (x) as a measure of uncertainty in 1 (ξ k (x) ≤ 0), and any γ(x) In this scenario it is natural to consider γ = s(η)s(−η) for some sigmoid function s(•), and the function Φ(•) in ( 31) is one such alternative.
For a single approximation of H 2,k or H 3,k it is really not necessary to split the importance sampling estimate as in ( 23)-( 27), but we will present it in this form as it will be convenient when we consider strategies for optimization.Given γi k as in ( 31), we approximate H 2,k and H 3,k by where we let h2 = h3 = 0.Alternatively, if the intention is to use H 2,k and H 3,k as upper bounds on H 1,k , we could let h2 = ∑ w i where the sums are over i ∈ Īτ .

Numerical procedure for myopic optimization
In the myopic scenario, the optimal decision d k at each time step k is found by solving the following optimization problem where J i,k (d) is the relevant acquisition function as defined in ( 12).We propose a procedure where we make use of a UT-MCIS approximation of J i,k (d) to find an approximate solution to (33).This will build on the approximation of H i,k introduced in Section 4, but where we now also make use of the predictive model δ to approximate expectations with respect to future values of H i,k+1 .In Section 5.1 and Section 5.2 we present how the UT-MCIS approximation of J i,k (d) is obtained, and in Section 5.3 we propose a criterion for determining when the sequence of experiments should be stopped.The final algorithm is summarized in Section 5.4

The probabilistic model ( ξk , δk )
Starting with some probabilistic model (ξ k , δ k ), recall that ξ k represents uncertainty about the performance of the system under consideration, and δ k represents uncertainty with respect to outcomes of certain decisions.We have already discusses how to obtain a finite-dimensional approximation of ξ k , and likewise, this will also be needed for δ k .
Assuming δ k is square integrable, we will make use of the same type of finite-dimensional approximation as the one introduced for ξ k in Section 4.1.In this way, we end up with two finite-dimensional E k -measurable random variables E ξ and E δ , which in turn determine the approximations ξk (x, E ξ ) and δk (d, E δ ), where both ξk (x, e) and δk (d, e) are deterministic functions for e fixed.Here E ξ and E δ are generally not independent.
Remark 5 Note that if δ (d) is a function of some of the uncertain sub-components of ξ , then we might already have a finite-dimensional approximation of δ available.
Consider for instance the model in Example 3 and the discussion in the end of Section 4.1.In this case, ξ is obtained as a function of the finite-dimensional approximation ŷ1 (x, E) of a sub-component y 1 (x), and δ (d) is given as δ (d(x)) = y 1 (x) + ε(x).Hence, all we need is to find a finite-dimensional representation of the noise ε(x).But observational noise such as ε(x) is often described as a function of x and some 1-dimensional random variable, in which case no additional approximation will be needed.
We will let ( ξk , δk ) denote the finite-dimensional approximation of (ξ k , δ k ) corresponding to a finitedimensional random variable E = (E ξ , E δ ), and where ( ξ0 , δ0 ) is the initial model that is used as input for determining the first decision d 1 .
Remark 6 In the canonical case where a surrogate ỹ(x) is used to represent some unknown function y(x), an initial set of experiments is often performed to establish ỹ(x) before any sequential strategy is started.For instance, in the case where evaluation of y(x) means running deterministic computer code, it is normal to set up a space-filling initial design using e.g.Latin Hypercube Sampling.
When ỹ(x) is a Gaussian process model as described in Appendix A, specific mean and covariance functions may also be selected based on knowledge or assumptions about the phenomenon that is being modelled by y(x).For estimation of failure probabilities it is also convenient to make use of conservative prior mean values.That is, prior to any experiment ỹ(x) will correspond to a value associated with poor structural performance (small ξ ), such that α(ξ ) will be biased towards higher failure probabilities in the absence of experimental evidence.This reasonable from a safety perspective, and also numerically as larger failure probabilities are easier to estimate.

Acquisition function approximation
To find an approximate solution to the optimization problem (33), we will replace the acquisition function J i,k (d) with an approximation Ĵi,k (d).Recall that J i,k (d) as defined in ( 12) is a function of E k,d H i,k+1 , where E k,d is the conditional expectation with respect to E k with d k = d.In Section 4 we introduced an approximation H i,k , and we will make use of the same idea to approximate E k,d H i,k+1 .
Assume k experiments have been performed, giving rise to the model (ξ k , δ k ) and the approximation ( ξk , δk ).If we consider the k-th decision d k = d, then H i,k+1 is a priori a δ k (d)-measurable random variable.That is, H i,k+1 is a function of δ k (d), and we are interested in the expectation E k,d H i,k+1 = E H i,k+1 (δ k (d)) .To approximate this quantity, we can make use of ( ξk , δk ) in the place of (ξ k , δ k ), in which case H i,k+1 becomes a function of E and we can approximate its expectation using UT.
The approximate acquisition functions are then given as where E k,d [ Ĥi,k+1 ] is obtained as follows: denote sigma-points as introduced in Section 4.2 for E ξ and E δ respectively.We then let {(x i , w i , ηi ) | i ∈ I }, I = {1, . . ., N 0 } be a set of samples generated as described in Section 4.3, where ηi is obtained using the UT approximation of ξk (x i , E ξ ).As for the approximation of H i,k discussed in Section 4.5, we let I τ = {(x i , w i , ηi ) | ηi < τ} and define the subset I n τ ⊆ I τ of size n.The approximations of E k,d H i,k+1 for i = 1, 2 and 3 will all be based on samples of ξk+1 of the form ξ m,i, j k+1 = ξk+1 (x, e ).The scalar ξ m,i, j k+1 is computed for all j = 1, . . ., M ξ , m = 1, . . ., M δ and i =∈ I n τ .As in Section 4.5 we set h2 = h3 = 0 and compute h1 as in ( 23) with The approximation E k,d [ Ĥ1,k+1 ] is just a weighted sum of the terms in ( 35), but for clarity we present it in the following three steps UT of H 1,k+1 : UT of E k,d H 1,k+1 : The UT-MCIS approximation of E k,d H 2,k+1 and The weighted sums that gives the approximations of E k,d H 2,k+1 and E k,d H 3,k+1 can be obtained as follows UT of var[ ξk+1 (x i )] : Using Φ to approximate γk+1 MCIS of H 2,k+1 : MCIS of H 3,k+1 : and where E k,d [ Ĥ2,k+1 ] and E k,d [ Ĥ3,k+1 ] are obtained with the same formula as for E k,d [ Ĥ1,k+1 ] in (38).
Remark 7 The number of model updates and function evaluations needed to generate the set { ξ m,i, j k+1 } are M δ and nM ξ M δ .We can view this as a discretization of the system dynamics, where there are only M δ possible future scenarios corresponding to the decision d k = d, which are given by the model updates . The samples in (35) are the ones needed for approximating the measure of residual uncertainty corresponding to ξ k+1 (e δ m ) for each m = 1, . . ., M δ .
Moreover, although the approximations E k,d [ Ĥi,k+1 ] are presented as weighted sums of the nM ξ M δ terms ξ m,i, j k+1 , this can also be obtained from a sequence of nested loops for a more memory efficient implementation.See for instance the schematic illustration in Figure 3 below.

Stopping criterion
For design strategies that make use of heuristic acquisition functions, it can be challenging to determine an appropriate stopping criterion.Here, we have considered the approximation Ĥ1,k which has a natural interpretation.Hence, even if we make use of a criteria such as Ĥ2,k or Ĥ3,k to determine the next optimal decision, it makes sense to use Ĥ1,k as an indicator of when the potential uncertainty reduction from future experiments is diminishing.
We will let E[ αk ] and Ĥ1,k be given as in (30), and define Then Vk is the UT-MCIS approximation of the coefficient of variation of the failure probability α k with respect to epistemic uncertainty.We will let Vk ≤ V max for some threshold V max serve as a criterion for stopping the myopic iteration procedure, in the case where a predefined maximum number of iterations K max has not already been reached.

Remark 8
The coefficient of variation is often used as a numerical criterion for convergence in Monte Carlo simulation.In structural reliability analysis, a coefficient of variation below 0.05 is often used as an acceptable level for failure probability estimation.Note also that the criterion Vk ≤ V max for arbitrary V max ≥ 0 implicitly assumes that the epistemic uncertainty can be reduced to zero in the limit.If this is not the case, one might instead consider stopping when Vk is no longer decreasing.A different stopping criterion is also considered in Section 6.4.

Algorithm
The complete procedure for myopic optimization is summarized in Algorithm 1.Note that for simplicity the number of MCIS samples N 0 and n are specified as input, but one may also consider deciding these using ( 27) and ( 28).Using a standard technique in Monte Carlo simulation, one could keep increasing N 0 and n until the coefficient of variation (std/mean) of the relevant estimator is sufficiently small.

Numerical experiments
Here we present a few numerical experiments using the algorithm for myopic optimal design presented in Section 5.4.Four experiments are presented, each with it's own objective: 1) Section 6.1:A toy example in 1d for conceptual illustration of the sequential design procedure.
Algorithm 1: Myopic optimization input: Model and sigma-points: ( ξ0 , δ0 ) and {(v j , e δ j )}.Number of samples for UT-MCIS and threshold: N 0 , n ∈ N and τ > 0, Max number of iterations and convergence criteria: (3) Compute the set { ξ m,i, j k+1 } as in (35) and define the function Ĵi,k (d) as in (34) (for i = 1, 2, or 3 depending on the acquisition function of choice) (4) Find the optimal decision: Break.Convergence has been reached before K max iterations.
2) Section 6.2:A hierarchical model with multiple 'expensive' sub-components.3) Section 6.3:A non-hierarchical benchmark problem for comparison against alternative strategies.4) Section 6.4 -A model that is more in resemblance of a realistic application in structural reliability analysis, where we introduce different types of decisions by considering both probabilistic function approximation and Bayesian inference of model parameters through measurements with noise.
All numerical experiments have been performed using Algorithm 1 with the parameters τ = 3, N 0 = 10 4 , n = 10 3 and V max = 0.05.The probabilistic surrogate models used in the examples are all Gaussian process (GP) models with Matérn 5/2 covariance.A short summary of the relevant Gaussian process theory is given in Appendix A.

Example 1: Illustrative 1d example
To illustrate the myopic procedure, we present a simple 1d example similar to the one given in (Bect et al., 2012), where we aim to emulate the limit state function We assume that g(x) can be evaluated at any x ∈ R without error, but that function evaluations are expensive.We will let ξ (x) be the probabilistic surrogate in the form of a Gaussian process, where we use a prior mean µ(x) = −0.5 together with a Matérn 5/2 covariance function with fixed kernel variance σ 2 c = 0.1 and length scale l = 0.5.We assume that X follows Normal distribution with mean µ X = −0.5 and standard deviation σ X = 0.2, and our goal is to estimate α(g) = P(g(X) ≤ 0) using only a small number of evaluations of g(•).The set of decisions is therefore D = ∪ x {evaluate g(x)} with respective outcomes o(x) = g(x), and a predictive model for outcomes given as Using a large number of samples of g(X) we estimate α(g) ≈ 0.0234, and we will consider this as the 'true' failure probability for comparison.
We initiate ξ by evaluating g(x) at x = µ X .For subsequent function evaluations, we minimize the expected variance in the failure probability.I.e.we minimize the acquisition function J 1,k given in (12) with λ ≡ 1.For comparison we also evaluate J 2,k and J 3,k , and in this example it seems that all three acquisition functions would perform equally well.Figure 4 shows ξ k and the corresponding three acquisition functions for the first few experiments, and Figure 5 g Fig. 4 (Example 1) The top row shows the true limit state function g(x), the probability density of X, and the mean ± 2 standard deviations of the GP ξ k for k = 0, 1 and 3.The samples indicated with × on the x-axis are used in the importance sampling estimates of J 1,k , J 2,k and J 3,k that are shown in the bottom row.

Example 2: A 3 layer hierarchical model with 7D input
In this example we consider the structural reliability benchmark problem given as problem RP38 in (Rozsas and Slobbe, 2019).Here, x = (x 1 , . . ., x 7 ) ∈ X = R 7 , and the limit state function g(x) can be written in terms of intermediate variables as follows: where c 1 , c 2 , c 3 and c 4 are constants: Figure 6 shows a graphical representation of how g(x) depends on the intermediate variables z 1 , y 1 , y 3 and y 4 .We will assume that the functions y 2 (x) and z 1 (y) will require probabilistic surrogates, where y 2 (x) and z 1 (y) can be evaluated without error for any input x and y.We will also assume that there is no difference in the cost associated with  evaluating y 2 or z 1 , and our goal is to estimate the failure probability α(g) while keeping the total number of function evaluations of y 2 (x) and z 1 (y) as small as possible.Note that the effective domain of y 2 is 1-dimensional and the effective domain of z 1 is 2-dimensional.Hence, using surrogates for y 2 and z 1 should be much more efficient than building a single surrogate for g using samples g(x i ).
x y 2 y 1 y 3 y 4 z 1 g Fig. 6 (Example 2) Hierarchical representation of g(x).We assume that the intermediate variables y 2 (x) and z 1 (y) are expensive to evaluate.
Convergence was reached at iteration k = 10, after 2 additional evaluations of y 2 and 8 additional evaluations of z 1 .Figure 7 shows the updated surrogate models, ỹ2 |I k and z1 |I k for k = 10, and Figure 8 shows how α(ξ k ) evolves with each iteration.At each iteration, the next experiment was decided by minimizing the acquisition function J 3,k with respect to updating each of the two surrogate models.

±VWG87DSSUR[
Fig. 8 (Example 2) Top: Mean ± 2 standard deviations of α k after k iterations, as computed using the approximation described in Section 4. Bottom: The distribution of α k at the final iteration k = 10, estimated from a double-loop Monte Carlo (i.e. by sampling from α(ξ k ) without any approximation).

Example 3: The 4 branch system
Here we consider the 'four branch system', a classical 2D benchmark problem given by the limit state and where x 1 and x 2 are independent standard normal variables.In this example we will not write (51) as an hierarchical model, in order to compare our method with other alternatives that are tailored to to non-hierarchical setting.We therefore let ξ (x) be a Gaussian process surrogate of g(x), constructed from observations (x i , g(x i )).For the initial 'conservative' Gaussian process we select a prior mean of −1, a Matérn 5/2 kernel with parameters of (σ c = 1, l = 3), and condition on the initial observation (0, g(0)).According to Huang et al. (2017), the method called AK-MCS developed by Echard et al. ( 2011) is considered a typical and mature approach, and should therefore be a suitable candidate for comparison.In addition, Echard et al. ( 2011) also provide the results from using a number of other alternatives proposed in Schueremans and Gemert (2005).Table 3 gives a summary of the results from Echard et al. (2011), together with the those obtained using the approach presented in this paper.Our results in Table 3 are obtained using Algorithm 1 with three different stopping criteria, V max = 0.1, V max = 0.05 and V max = 0.025.Instead of point estimates we provide prediction intervals, which in this example contain the 'true' failure probability obtained with Monte Carlo in each scenario.From a practical perspective, even the estimates obtained using only 35 evaluations (V max = 0.1) of (51) seems acceptable.If we were to use the mean + 2 standard deviations as a conservative estimate, the relative error with respect to the 'true' failure probability is still less than 3 %.After an additional 30 iterations, this number drops to 0.65 %.Hence, our approach performs well with respect to the alternatives considered in (Echard et al., 2011;Schueremans and Gemert, 2005).It should also be noted that the Directional Sampling alternative in Table 3 is a method that is especially suitable for the specific 'radial' type of limit state surfaces as considered here, and a this level of performance is not expected in general.
Optimization was performed using the approximate acquisition function Ĵ3,k , and Figure 9 shows how the sequence of observations are located with respect to the failure set g = 0.The resulting sequence of failure probabilities after each iteration is illustrated in Figure 10.

Example 4: Corroded pipeline example
To give an example of a scenario where there are different types of experiments, we consider a probabilistic model which is recommended for engineering assessment of offshore pipelines with corrosion (DNV GL, 2017).The failure mode under consideration is where a pipeline bursts, when the pipeline's ability to withstand the high internal pressure has been reduced as a consequence of corrosion.

The structural reliability model
Figure 11 shows a graphical representation of the structural reliability model.Here, a steel pipeline is characterised by the outer diameter (D [mm]), the wall thickness (t [mm]) and the ultimate tensile strength (s [MPa]).In this example we let D = 800, t ∼ N (µ = 20, cov = 0.03), and s ∼ N (µ = 545, cov = 0.06), where cov is the coefficient of variation (standard deviation / mean).
The pipeline contains a rectangular shaped defect with a given depth (d [mm]) and length (l [mm]), where l ∼ N (µ = 200, σ 2 = 1.49) and where d will be inferred from observations.
Given a pipeline (D,t, s) with a defect (d, l), we can determine the pipeline's pressure resistance capacity (the maximum differential pressure the pipeline can withstand before bursting).We let p FE [MPa] denote the capacity coming from a Finite Element simulation of the physical phenomenon.
From the theoretical capacity p FE , we model the true pipeline capacity as p c = X m • p FE , where X m is the model discrepancy, X m ∼ N (µ m , σ 2 m ).For simplicity we have as- sumed that X m does not depend on the type of pipeline and defect, and we will also assume that σ m = 0.1, where only the mean µ m will be inferred from observations of the form p c /p FE .Finally, the pressure load (in MPa) is modelled as a Gumbel distribution with mean 15.75 and standard deviation 0.4725.The limit state representing the transition to failure is then given as g = p c − p d .

Different types of decisions
We consider the following three types of decisions 1. Defect measurement: We assume that unbiased measurements of the relative depth d/t can be obtained.The measurements come with additive Gaussian noise, ε ∼ N (0, σ 2 d/t ), and we will assume that three types of inspection are available, corresponding to σ d/t = 0.02, 0.04 and 0.08.2. Computer experiment: Evaluate p FE at some deterministic input (D,t, s, d, l). 3. Lab experiment: Obtain one observation of X m .
In order to generate synthetic data for this experiment, we assume that the true defect depth is d = 0.3t = 6 mm and that µ m = 1.0.Instead of running a full Finite Element simulation to obtain p FE , we will make use of the simplified capacity equation in (DNV GL, 2017), in which case Dt .

Results
To define the initial model ξ 0 we need a prior specification over the epistemic quantities d, µ m and p FE .We let d be a priori normal with mean 0.5 and standard deviation 0.15, and µ m normal with mean 1.0 and standard deviation 0.1.Consequently, the posteriors of d and µ m (and also X m ) given any number of observations are all normal.The function p FE is replaced by a GP surrogate with prior mean µ = −10 and σ c = 10, l = [1, 1, 1, 1] Matérn 5/2 parameters, which we initiate using a single observations at the expected value of the input.We assume that the computer experiments are cheap compared to the lab experiments, and that the direct measurements of d/t is most expensive.To these varying costs, we specify the acquisition function where c(d) is the cost of a given decision.(Note that in (52) the variable d refers to a decision, but for the remaining part of this example d will only refer to the defect depth).In (52) we have normalized the expected future measure of residual uncertainty with the current, which gives an estimate of the expected improvement given a certain decision.The numerical values representing difference in costs is given by c = 1 for computer experiments, c = 1.1 for lab experiments, and c = 1.11, 1.12, 1.13 for measurements of d/t with accuracy σ d/t = 0.08, 0.04 and 0.02 respectively.
In structural reliability analysis, the objective is not always to obtain an estimate of the failure probability that is as accurate as possible.A relevant problem in practice is to determine whether a structure satisfies some prescribed target reliability level α target .In this example, we aim to either confirm that the failure probability is less than the target α target = 10 −3 (in which case we can continue operations as normal), or to detect with confidence that the target is exceeded (and we have to intervene).For this purpose we intend to stop the iterative procedure if the difference between the expected and target failure probability is at least 4 standard deviations.In addition to the standard stopping criterion for convergence (45), we therefore introduce the stopping criterion Figure 12 shows how the UT-MCIS approximation of the failure probability evolves throughout 100 iterations.We have made use of i = 3 in (52) as we found the corresponding acquisition surface for p FE smoother than the alternative i = 1, and hence easier to minimize numerically.The stopping criterion ( 53) is reached after k = 25 iterations, and Figure 13 shows the corresponding posteriors of the relative defect depth d/t and the model discrepancy X m .
Throughout the examples in this paper we have initiated GP surrogate models using a single observation at the expected input.A different approach that is often found in practical applications is to initiate the GP surrogate with a space-filling design.A very common alternative is to make use of a Latin Hypercube sample (LHS), of size no more than 10 × the input dimension (although the appropriate number of samples naturally depends on how nonlinear the response is expected to be).
Table 4 shows a summary of the results from running this example with and without an initial design consisting of 10 LHS samples.For this example it does not seem to make any significant difference, but we see why the stopping criterion ( 53) is useful, as on average we can conclude that the failure probability is below the target value after around 30-40 iterations.
We leave this numerical experiment with an important remark, which is that specifying an appropriate cost in (52) can be difficult.If for instance the cost related to a measurement of d/t is set very high, then the decision to measure d/t will never be taken.In this example, it is not possible to reach the stopping criterion given in (53) without at least one such measurement, and hence, the myopic strategy will keep requesting measurements of X m and evaluations of p FE  indefinitely, accumulating a potentially infinite cost.This is indeed a drawback of the myopic strategy, which could be alleviated by looking multiple steps ahead, and at least through a full dynamic programming implementation.

Concluding remarks
We have presented a general formulation of the Bayesian optimal experimental design problem for structural reliability analysis, based on separation of the aleatory uncertainty or randomness associated with a given structure, and the epistemic uncertainty that we wish to reduce through experimentation.The effectiveness of a design strategy is evaluated through a measure of residual uncertainty, and efficient approximation of this quantity is crucial if we want to apply algorithms that search for an optimal strategy.Our proposed approach makes us of a pruned importance sampling scheme for subsequent estimation of (typically small) failure probabilities for a given epistemic realization, combined with the unscented transform epistemic uncertainty propagation.In our numerical experiments, we made use of a rather naive implementation of the unscented transform, in the sense that the number of sigma-points is very low, and that these are determined a priori with a deterministic procedure.Since the alternative by Merwe and Wan (2003) produced satisfactory results in all of our numerical examples, no further consideration was made with respect to alternative methods for sigma-point selection.From applications to Kalman filtering, it has been observed that this version of the unscented transform has a tendency to over-estimate the variance, which is something we notice also in our experiments.
For the application we consider in this paper, we emphasize that the unscented transform is used as a proxy for the measure of residual uncertainty to be used in optimization, as a numerically efficient alternative that should be proportional to the true objective.Hence, we view the unscented transform as a tool to find the best decision or strategy, where we get the possibility of exploring many decisions approximately rather than a few exactly.Once an optimal strategy is found, we estimate the corresponding measure of residual uncertainty using a pure Monte Carlo alternative which is exact in the limit.We note that for global optimization of acquisition functions, we have used a combination of random sampling and gradient based local optimization.With this procedure, an optimization objective given by H 3,k (and also H 2,k ) is generally more suitable than H 1,k , as it is less susceptible to noise coming from Monte Carlo estimation (see for instance Figure 7).On the other hand, H 1,k has a natural interpretation (the variance of the failure probability), and is therefore a better measure for evaluating convergence, or for early stopping as discussed in Section 6.4.
Although we focus on the estimation of a failure probability in this paper, many of the main ideas should also be applicable for other estimation objectives using models where a hierarchical structure can be utilized.For instance, when α k is some other quantity of interest depending on the random variable g(X), not necessarily given by an indicator function as in (1).For the applications considered in this paper, we have assumed that an isoprobabilistic transformation of X to a standard normal variable is available, which is often the case in structural reliability models.We make use of this assumption only to apply some well known techniques for failure probability estimation, but note that other alternatives, for instance the one presented in Appendix B.3, can be used instead.
There are several ways to improve the methodology presented in this paper.For instance, other alternatives of the unscented transform could be applied, see for instance Menegaz et al. (2015), or the parameters determining the set of sigma-points used in this paper could be optimized as in (Turner and Rasmussen, 2010).
As seen in Section 6.4, the myopic, one-step look ahead strategy, can make it impossible to reach the stopping criterion of the algorithm.As mentioned, a way to avoid this problem is by looking at the whole dynamic programming formulation (10).However, this formulation suffers from the curse of dimensionality.Since the myopic formulation corresponds to truncating the sum in the dynamic programming formulation (10) to only one term, it is of interest to study methods where more terms of the sum are included (multistep look ahead).How much better do the estimations get by including an extra term, and how much does the computation time increase?Is it possible to determine an optimal choice of truncation where we weigh accuracy and computa-tion time against one another?Different ways of finding approximate solutions to the complete dynamic programming problem has been the focus of much research within areas such as operations research, optimal control and reinforcement learning, and trying out some of these alternatives is certainly interesting avenue for further research.
Another interesting topic worth investigating is how the numerical examples in this paper compare to the case where we estimate the buffered failure probability instead of the classical failure probability.Buffered failure probabilities were introduced by Rockfellar and Royset Rockafellar and Royset (2010) as an alternative to classical failure probabilities in order to take into account the tail distribution of the performance function.See Dahl and Huseby Dahl and Huseby (2019) for an application of this concept to structural reliability analysis.
One may also discuss whether using heuristic optimization objectives chosen to approximate the variance is reasonable.By essentially focusing on minimizing the variance of the failure probability, we say that all deviations from the true value is equally bad.In reality, overestimating the failure probability can be costly, but is not nearly as problematic as underestimating the failure probability.Because of this, the variance may not be the most appropriate measure of risk.It would be interesting to also derive heuristic optimization objectives based on approximating other risk measures.
These questions are of interest, but beyond the scope of the current paper, and the topics are left for future research.

j
, d, e δ m ) is the finite-dimensional approximation of ξ k |d k = d, o k = δ (e δ m ) evaluated at (x, e ξ j

Fig. 5 (
Fig.5(Example 1) Top: Mean ± 2 standard deviations of α k after k iterations, as computed using the approximation described in Section 4. Bottom: The distribution of α k at the final iteration k = 3, estimated from a double-loop Monte Carlo (i.e. by sampling from α(ξ k ) without any approximation).

Fig. 7 (
Fig.7(Example 2) The top row shows the GP models ỹ2 and z1 with respect to P k for k = 10.The number above each observations is the iteration index k, and convergence was obtained after 2 evaluations of y 2 and 8 evaluations of z 1 .The final acquisition functions are shown in the bottom row.

Fig. 9 (
Fig. 9 (Example 3) The limit state (51) together with the expected failure surface E[ξ 65 ] and the 65 observations collected before convergence at V65 < 0.025.The proposal distribution q X used for importance sampling is a mixture of Gaussian random variables centered at the four design points (×) as described in Appendix B. The pruned samples shown in the figure are mostly located around E[ξ 65 ] = 0 and in other regions where the level set ξ 65 = 0 is uncertain.

Fig. 10 (
Fig. 10 (Example 3) Top: Mean ± 2 standard deviations of α k after k iterations, as computed using the approximation described in Section 4. Bottom: The distribution of α k at the final iteration k = 65, estimated from a double-loop Monte Carlo (i.e. by sampling from α(ξ k ) without any approximation).

Fig. 11 (
Fig. 11 (Example 4) Graphical representation of the corroded pipeline structural reliability model.The shaded nodes d, p FE and µ m have associated epistemic uncertainty that can be reduced through experiments.
Fig. 12 (Example 4) Top: Mean ± 4 standard deviations of α k after k iterations, as computed using the approximation described in Section 4. The stopping criterion (53) is reached after 25 iterations.Bottom: The acquisition functions (52) for each type of experiment during the first 50 iterations.

Fig. 13 (
Fig. 13 (Example 4) The posterior distributions of d/t and X m when the stopping criterion (53) is reached at k = 25.

Table 1
Overview of the framework for the optimal experimental design problem for structural reliability analysis

Table 2
Overview of the framework for the optimal experimental design problem for structural reliability analysis with sequential model updating