1 Introduction

Bilevel programs arise from the interplay of two decision makers on different levels of a hierarchy: The leader decides first and passes the upper-level decision to the follower. Incorporating the leader’s decision as a parameter, the follower then returns an optimal solution of the lower-level problem. The leader’s outcome depends on both, their decision and the solution that is picked by the follower. While the first formulation of a bilevel problem dates back to a monograph on duopoly market models published in 1934 (cf. [37]), these problems have not received extensive attention until the 1970s (for more details, we refer to [11]).

In this paper, we study a class of pessimistic bilevel stochastic programs, where the lower level problem has a strictly convex quadratic objective function and a fixed feasible set. As an application, we study a mechanical shape optimization problem in which the leader (the designer) minimizes a tracking functional over the set of feasible material distributions, whereas the follower (the test engineer) chooses forces from an admissible set to maximize a compliance objective. The safety of the construction is then evaluated pessimistically with the choice of the worst possible response. Randomness comes into play via manufacturing errors that stochastically perturb the material parameters in the actual construction phase.

In what follows, let us briefly review related work. Bilevel programs are nonconvex, nondifferentiable and NP-hard (non-deterministic polynomial-time-hard) [1]. Moreover, conceptual difficulties arise if the lower level problem has more than a single optimal solution. In this setting, one typically considers the so-called optimistic formulation, where cooperation of the follower is assumed, or takes a pessimistic stance and hedges against the worst possible outcome [23]. It is well-known that pessimistic bilevel programs have weaker analytical properties than their optimistic counterparts. In general, the existence of optimal solutions to a pessimistic bilevel program can only be assured under restrictive conditions including weak analyticity for the lower level objective function and strong assumptions on the structure of the lower level feasible set (cf. [28, Theorem 4.1]). These difficulties can be overcome by considering a modified setting, where the leader hedges against solutions that are almost optimal for the lower level problem. Sufficient conditions for convergence of the modified optimal values to the original one have been established in [26]. A systematic analysis of more inner regularization techniques has been recently provided by Lignola and Morgan in [24, 25].

A bilevel stochastic program arises if the problem depends on an additional random parameter, that only the follower can observe before making their decision. In contrast, the leader has to decide nonanticipatorily, but is aware of the underlying probability distribution. In this setting, the upper level objective function can be understood as a random variable, which allows the leader to base their decision on some statistical functional. The expected value is for instance considered in the very first paper on bilevel stochastic optimization [32]. In a linear setting, more general models incorporating a variety of convex risk measures have been recently studied in [2]. The control of a vibrating string with stochastic data has been investigated in [12] for the case of the excess-probability as a goal function. A level-set based approach for solving risk-averse structural topology optimization problems with random field loading and material uncertainty is given in [29].

Already in 2001, Christiansen et al. [5] studied a stochastic bilevel programming perspective in shape optimization. They assume that the lower level deals with the deformation of the structure for a given shape and given forces subject to different constraints, while on the upper level the shape is decided based on an optimization of weight or a global stiffness measure. Assuming that the lower level is uniquely solvable, the authors provide sufficient conditions for the existence of optimal solutions and discuss algorithmic aspects. Herskovits et al. [21] reformulated an elastic shape optimization problem with constraints as a bilevel optimization problem. They investigate a contact problem with non-penetration constraints on the lower level and stress constraints on the upper level. In [39], Zuo investigated shape optimization of thin shells in car design as an optimistic bilevel optimization problem, where on the lower level the mass distribution along the body frame of the vehicle and on the upper level the shape of shell segments of the hull of the vehicle are optimized. Sinha et al. [36] recently presented a general overview on bilevel optimization also covering optimal design problems. In this context, they considered weight or cost optimization of a structure on the upper level and, on the lower level, the computation of displacements and stresses via minimization of the governing physical variational problem. To the best of our knowledge, pessimistic hierarchical optimization in shape optimization with an objective functional differing from the physical energy of the system on the upper two levels has not been investigated so far.

The approach presented here is based on our previous work in [8,9,10], which grew out of the aspiration to mobilize methodology from mainly economy-driven decision making under (stochastic) uncertainty in order to study PDE-constrained optimization with an emphasis on engineering-related topics such as shape optimization. The risk-neutral models and models with risk aversion in the objective or the constraints were treated with the classical expectation, with risk measures, or by invoking comparisons using stochastic dominance relations. In the spirit of this experience, the present paper is heading for models with the above bilevel features coming to the fore in the presence of uncertainty.

The present work is organized as follows: In Sect. 2, we introduce a bilevel programming formulation, and in Sect. 3, the extension to a bilevel problem under stochastic uncertainty, which will be placed in the context of elastic shape optimization later in the paper. Based on this, we analyze both problem formulations and investigate their solvability. The application to a mechanical shape optimization problem via discrete shells is considered in Sect. 4 as well as its numerical optimization and the results of our numerical analysis. Finally, in Sect. 5, we draw conclusions and discuss possible future extensions of our work.

2 Bilevel problem formulation

Before formally introducing the bilevel problem, we briefly present the key objects. At the lowest level, y[uf] is the elastic displacement of the discrete shell which depends nonlinearly on the material parameters u and linearly on the applied forces f. The lower-level optimal solution set \(\varPsi [u]\), depending on the material parameter u, is the set of values of f which maximize a quadratic functional in y. In the upper-level of our pessimistic bilevel problem, we finally minimize the worst-case cost J of the lower-level optimization with respect to the material parameters u.

In detail, this pessimistic bilevel problem reads as

$$\begin{aligned} \underset{u \in \mathcal {U}}{\min } \left\{ \max _{f \in \varPsi [u]} J[u,f] \right\} , \end{aligned}$$
(1)

where \(\mathcal {U} \subseteq (0,\infty )^n\) is a nonempty closed set, and \(J : \mathcal {U} \times \mathbb {R}^N \rightarrow \mathbb {R}\) denotes the cost functional of the leader, which we assume to be continuous. In our application, this will be a tracking-type objective for a discrete shell with thickness/stiffness parameters u in an admissible set \(\mathcal {U}\) and applied forces f. Moreover, we let the lower level optimal solution set mapping \(\varPsi : \mathcal {U} \rightrightarrows \mathbb {R}^N\) be given by

$$\begin{aligned} \varPsi [u] {:}{=}\underset{f \in \mathcal {F}}{{{\,\mathrm{arg\,max}\,}}} \left\{ y[u, f]^\top H[u] y[u, f] \right\} \end{aligned}$$
(2)

with a nonempty, low-dimensional, convex and compact set of admissible forces \(\mathcal {F} \subset \mathbb {R}^N\), a function \(H: \mathbb {R}^n \rightarrow \mathbb {R}^{N \times N}\) such that the the restriction \(H|_{\mathcal {U}}\) is continuous and takes values in the cone of symmetric positive definite matrices \(\mathcal {S}^N_{++}\). Throughout the paper, the notation \(g: X \rightrightarrows Y\) is used for a multifunction g that maps the elements of some set X to subsets of some set Y. The displacement y depends on a vector \(u\) of thickness/stiffness parameters and the forces f. In fact, the mapping \(y: \mathcal {U} \times \mathbb {R}^N \rightarrow \mathbb {R}^N\) in (2) is defined by the condition

$$\begin{aligned} \{y[u, f]\} = {{\,\mathrm{arg\,min}\,}}_{y \in \mathbb {R}^N} \left\{ \frac{1}{2} y^\top H[u] y - y^\top M f \right\} \end{aligned}$$
(3)

for some fixed matrix \(M \in \mathcal {S}^N_{++}\), where uniqueness follows from \(H[u] \in \mathcal {S}^N_{++}\). In our application, we consider a discrete shell with n triangular facets subject to a force distribution f in a set of admissible forces in \(\mathbb {R}^N\), with N being three times the number of vertices. For this case, the elastic displacement y[uf] is given as the minimizer of the total free energy of a linearized elasticity model with \(H[u]\) denoting the Hessian of an originally nonlinear elastic energy and M the mass matrix for the discrete reference shell.

The above hierarchical problem Eqs. (13) can also be understood as a three-level program. However, as \(H[u] \in \mathbb {R}^{N \times N}\) is symmetric and positive definite for any admissible material parameter \(u \in \mathcal {U}\), the third-level problem in Eq. (3) is uniquely solvable. Invoking first-order optimality conditions, we obtain the explicit representation

$$\begin{aligned} y[u,f] = H[u]^{-1} M f. \end{aligned}$$
(4)

Plugging this solution into the lower level problem yields a bilevel problem. Moreover, Eq. (4) leads to a simple expression for the lower level optimal value function \(\psi : \mathcal {U} \rightarrow \mathbb {R}\),

$$\begin{aligned} \psi [u] {:}{=}\underset{f \in \mathcal {F}}{\max } \left\{ f^\top M H[u]^{-1} M f \right\} , \end{aligned}$$
(5)

and to the reformulation of the definition of \(\varPsi \) in Eq. (2) as

$$\begin{aligned} \varPsi [u] = \left\{ f \in \mathcal {F} \mid f^\top M H[u]^{-1} M f = \psi [u] \right\} . \end{aligned}$$
(6)

Lemma 1

The lower level optimal value function \(\psi \) defined by (5) is well-defined and continuous. In addition, the multifunction \(\varPsi \) is closed.

Proof

For fixed u, the argument in (5) is quadratic in f, and in particular continuous. Since \(\mathcal {F}\) is nonempty and compact, the maximum exists.

For any \(f\in \mathcal {F}\), the argument in (5) is a continuous function of u. Moreover, \(\psi \) is the pointwise supremum of a familiy of continuous functions and thus lower semicontinuous by [34, Proposition 1.26(a)].

We assume that \(\psi \) is not upper semicontinuous, which yields \(u_*\in \mathcal {U}\) and a sequence \(u_j\rightarrow u_*\) such that \(\lim _{j\rightarrow \infty } \psi [u_j] > \psi [u_*]\). For any j, we choose \(f_j\in \mathcal {F}\) such that \(\psi [u_j] = f_j^\top M H[u_j]^{-1} M f_j\). Passing to a subsequence, we can assume that \(f_j\) converges to some \(f_*\in \mathcal {F}\), since \(\mathcal {F}\) is compact. By continuity of \(f^\top M H[u]^{-1} M f\) we obtain

$$\begin{aligned} \psi [u_*]\ge f_*^\top M H[u_*]^{-1} M f_*=\lim _{j\rightarrow \infty } f_j^\top M H[u_j]^{-1} M f_j=\lim _{j\rightarrow \infty } \psi [u_j], \end{aligned}$$

a contradiction. Hence \(\psi \) is continuous.

Then the graph of the solution set mapping

$$\begin{aligned} \mathrm {gph}\,\varPsi = \left\{ [u,f] \in \mathcal {U} \times \mathcal {F} \mid \psi [u] - f^\top M H[u]^{-1} M f = 0 \right\} \end{aligned}$$

is the intersection of the closed set \(\mathcal {U} \times \mathcal {F}\) and the level set of a continuous function and thus closed. \(\square \)

Proposition 1

The mapping \(\varPhi : \mathcal {U} \rightarrow \mathbb {R}\) defined by

$$\begin{aligned} \varPhi [u] {:}{=}\max _{f \in \varPsi [u]} J[u,f] \end{aligned}$$

is well-defined and upper semicontinuous. Moreover, \(\varPhi \) is continuous at any \(u \in \mathcal {U}\) for which \(\varPsi [u]\) is a singleton.

Proof

For any fixed \(u\in \mathcal {U}\), by (6) and Lemma 1 the lower-level set solution mapping \(\varPsi [u]\) is a nonempty, closed subset of the compact set \(\mathcal {F}\), and hence compact. Thus, \(\varPhi \) is well-defined by continuity of the upper-level cost functional J on \(\mathcal {U} \times \mathbb {R}^N\). Consider any sequence \(\lbrace u_k \rbrace _{k \in \mathbb {N}} \subseteq \mathcal {U}\) that converges to \(u \in \mathcal {U}\). By the previous considerations, there exists a sequence \(\lbrace f_k \rbrace _{k \in \mathbb {N}}\) such that \([u_k,f_k] \in \mathrm {gph}\,\varPsi \) and

$$\begin{aligned} \varPhi [u_k] = J[u_k,f_k] \end{aligned}$$

holds for any \(k \in \mathbb {N}\). As \(\mathcal {F}\) is compact, we may assume without loss of generality that \(\lbrace f_k \rbrace _{k \in \mathbb {N}}\) converges to some \(f \in \mathcal {F}\). By Lemma 1, we have \([u,f] \in \mathrm {gph}\,\varPsi \) and thus

$$\begin{aligned} \limsup _{k \rightarrow \infty } \varPhi [u_k] = \limsup _{k \rightarrow \infty } J[u_k,f_k] = J[u,f] \le \max _{\bar{f} \in \varPsi [u]} J[u,\bar{f}] = \varPhi [u]. \end{aligned}$$

Hence, \(\varPhi \) is upper semicontinuous. If \(\varPsi [u]\) is a singleton, we also have

$$\begin{aligned} \liminf _{k \rightarrow \infty } \varPhi [u_k] = \liminf _{k \rightarrow \infty } J[u_k,f_k] = J[u,f] = \varPhi [u], \end{aligned}$$

which completes the proof. \(\square \)

Remark 1

To understand the significance of Proposition 1 it is useful to compute these quantities explicitly in a simple low-dimensional example. Assume \(n=1\), \(N=2\), \({\mathcal {U}}=[\frac{1}{2},\frac{3}{2}]\), \(M=Id\), \(H[u]=\begin{pmatrix} 1\quad &{}\quad u-1 \\ u-1 &{}\quad 1\end{pmatrix}^{-1}\), \({\mathcal {F}}=[-1,1]\times [0,1]\). Then one computes \(f^TMH[u]^{-1}Mf=f_1^2+f_2^2+2(u-1)f_1 f_2\). Maximizing this quantity as in (5) we see that only the two points \(\{\pm 1,1\}\) of \({\mathcal {F}}\) are relevant, and in particular \(\psi [u]=2+2|u-1|\). Further, from (2) (or, equivalently, (6)) we obtain the set of extremal forces

$$\begin{aligned} \varPsi [u]={\left\{ \begin{array}{ll}\left\{ \begin{pmatrix} 1 \\ 1 \end{pmatrix}\right\} &{} \text { if }\quad u>1,\\ \left\{ \begin{pmatrix} -1 \\ 1 \end{pmatrix}\right\} &{} \text { if }\quad u<1,\\ \left\{ \begin{pmatrix} 1 \\ 1 \end{pmatrix},\begin{pmatrix} -1 \\ 1 \end{pmatrix} \right\}&\text { if }\quad u=1. \end{array}\right. } \end{aligned}$$

Choosing for example \(J[u,f]=uf_1\) one obtains

$$\begin{aligned} \varPhi [u]={\left\{ \begin{array}{ll} J[u,1]=u &{} \text { if }\quad u\ge 1,\\ J[u,-1]=-u &{} \text { if }\quad u<1. \end{array}\right. } \end{aligned}$$

In particular, it is clear that on \(\{u\ne 1\}\) the set-valued function \(\varPsi \) is a singleton and \(\varPhi \) is continuous, whereas \(\varPsi [1]\) contains two elements and \(\varPhi \) is not continuous, and not lower semicontinuous, at \(u=1\).

As this example shows, \(\varPhi \) arises as the objective function of a pessimistic bilevel program, where the lower level problem may have more than a single optimal solution and can thus not be expected to be lower semicontinuous in general, a fact that was already observed in [11, example on pages 30-31]. Note that this may prevent the bilevel program (1) from having an optimal solution even if \(\mathcal {U}\) is compact.

To overcome the difficulties detailed above, we consider a model where the leader also hedges against \(\eta \)-optimal lower level solutions (cf. [24]). Specifically, we replace \(\varPsi \) with the mapping \(\varPsi _\eta : \mathcal {U} \rightrightarrows \mathbb {R}^N\) defined by

$$\begin{aligned} \varPsi _\eta [u] {:}{=}\left\{ f \in \mathcal {F} \mid \psi [u] - f^\top M H[u]^{-1} M f < \eta \right\} \end{aligned}$$

for some positive constant \(\eta \). This results in the modified upper level problem

$$\begin{aligned} \underset{u \in \mathcal {U}}{\min } \left\{ \sup _{f \in \varPsi _\eta [u]} J[u,f] \right\} . \end{aligned}$$
(7)

As \(\varPsi [u] \subseteq \varPsi _{\eta }[u]\) holds for any \(\eta > 0\) and \(u \in \mathcal {U}\), the optimal value in (7) yields an upper bound for the optimal value in (1).

Proposition 2

The mapping \(\varPhi _\eta : \mathcal {U} \rightarrow \mathbb {R}\) defined by

$$\begin{aligned} \varPhi _\eta [u] {:}{=}\sup _{f \in \varPsi _\eta [u]} J[u,f] \end{aligned}$$

is well-defined and lower semicontinuous for any \(\eta > 0\). In particular, (7) is solvable whenever \(\mathcal {U}\) is nonempty and compact.

Proof

First, note that \(\varPhi _\eta \) is well-defined and real-valued as, by continuity of J, for any \(u \in \mathcal {U}\)

$$\begin{aligned} \varPhi _\eta [u] \le \max _{f \in \mathcal {F}} J[u,f] < \infty . \end{aligned}$$

To prove semicontinuity, we consider a sequence \(\{u_k\}_{k \in \mathbb {N}} \subseteq \mathcal {U}\) converging to some \(u_*\in \mathcal {U}\). We select a sequence \(\{f_j\}_{j \in \mathbb {N}} \subseteq \varPsi _\eta [u_*]\), i.e. \(\psi (u_*) < \eta + f_j^\top H^{-1}[u_*]f_j\), such that \(J[u_*,f_j]\rightarrow \varPhi _\eta [u_*]\). By continuity of \(\psi \) and H, there is \(K_j\) such that for all \(k\ge K_j\) we have \(\psi (u_k)<\eta +f_j^\top H^{-1}[u_k]f_j\), which is the same as \(f_j\in \varPsi _\eta [u_k]\). Therefore

$$\begin{aligned} J[u_*,f_j]=\lim _{k\rightarrow \infty } J[u_k,f_j] \le \liminf _{k\rightarrow \infty } \varPhi _\eta [u_k]. \end{aligned}$$

Since j was arbitrary, taking the limit \(j\rightarrow \infty \) we conclude

$$\begin{aligned} \varPhi _\eta [u_*]=\lim _{j\rightarrow \infty } J[u_*,f_j] \le \liminf _{k\rightarrow \infty } \varPhi _\eta [u_k]. \end{aligned}$$

\(\square \)

Remark 2

In [26], the alternate model

$$\begin{aligned} \underset{u \in \mathcal {U}}{\min } \left\{ \max _{f \in \bar{\varPsi }_\eta [u]} J[u,f] \right\} \end{aligned}$$

with

$$\begin{aligned} \bar{\varPsi }_\eta [u] {:}{=}\left\{ f \in \mathcal {F} \mid \psi [u] - f^\top M H[u]^{-1} M f \le \eta \right\} \end{aligned}$$

is considered. Under the present assumptions it can be shown that

$$\begin{aligned} \lim _{\eta _\downarrow 0} \inf _{u \in \mathcal {U}} \left\{ \max _{f \in \bar{\varPsi }_\eta [u]} J[u,f] \right\} = \inf _{u \in \mathcal {U}} \left\{ \max _{f \in \bar{\varPsi }[u]} J[u,f] \right\} . \end{aligned}$$

However, the function

$$\begin{aligned} \bar{\varPhi }_\eta [u] {:}{=}\sup _{f \in \bar{\varPsi }_\eta [u]} J[u,f] \end{aligned}$$

is not lower semicontinuous in general, which is why we rather use formulation (7).

3 Stochastic model

A bilevel stochastic program arises if a random vector enters the upper or lower levels as a parameter, with the information constraint that only the follower can observe the realization of the randomness before making their decision. In contrast, the leader has to decide nonanticipatorily, but is aware of the distribution of the randomness, which is independent of the leader’s decision.

In the following, we shall study a setting where the leader’s decision u is subject to a random perturbation. To become more specific, let \(\Upsilon : \Omega \rightarrow \mathbb {R}^n\) be a random vector (i.e., a \(\mathcal {B}\)-Borel measurable function) on some probability space \((\Omega , \mathcal {B}, \mathbb {P})\). We obtain the following pattern of decision and observation:

$$\begin{aligned} \text {Leader decides}\ u \rightarrow \text {Realization of}\ \Upsilon \rightarrow \text {Follower decides}\ f. \end{aligned}$$

In our model, the randomness results from manufacturing errors and has the following effect: Throughout (1)-(3), the leader’s decision u is replaced with the perturbed material vector \(u \odot \upsilon \), where \(\odot \) denotes the componentwise multiplication and \(\upsilon \) is the realization of \(\Upsilon \). In this setting, the leader seeks to ensure that the resulting material parameters are feasible regardless of the realization of the randomness. In order for the perturbed material vector to be almost-surely admissible, the leader has to choose the design parameter u in the induced feasible set

$$\begin{aligned} \mathcal {U}_\Upsilon {:}{=}\left\{ u \mid u \odot \upsilon \in \mathcal {U} \quad \forall \,\upsilon \in \mathrm {supp}\,{\mu _{\Upsilon }} \right\} , \end{aligned}$$

where \(\mu _{\Upsilon } {:}{=}\mathbb {P} \circ \Upsilon ^{-1}\) is the induced Borel probability measure on \(\mathbb {R}^n\). Note that the set \(\mathcal {U}_\Upsilon \) is closed as the intersection of closed sets. Typically, we think of a situation where \(\mathrm {supp}\,\mu _{\Upsilon } \subseteq [a, b]^n\) holds for some \(0< a < b\), possibly both close to 1.

We will consider the stochastic extension of the classical pessimistic bilevel program (1)–(3) as well as the modified version (7). In both situations, we will assume the following assumption:

  1. (A1)

    The support of \(\mu _{\Upsilon }\) is bounded.

In the classical setting, we will need the following additional assumptions:

  1. (A2)

    \(\mathcal {F}\) is a nonempty, bounded polyhedron, i.e. the convex hull of its nonempty and finite set of extreme points \(\mathcal {P} \subseteq \mathcal {F}\).

  2. (A3)

    \(\mu _{\Upsilon }\) is absolutely continuous with respect to the Lebesgue measure \(\mathcal {L}^n\).

  3. (A4)

    There exists an open and connected set \(\tilde{\mathcal {U}} \subseteq \mathbb {R}^n\), such that \(\mathcal {U} \subseteq \tilde{\mathcal {U}}\), \(H|_{\tilde{\mathcal {U}}}\) is real analytical, and it takes values in a closed subset of \(\mathcal {S}^N_{++}\).

From the leader’s point of view, the material vector that will be passed down to the lower level after the stochastic perturbation has occurred can be understood as a random vector \(u \odot \Upsilon : \mathcal {U} \odot \Omega \rightarrow \mathbb {R}^n\) which is parameterized by the decision u. Similarly, the upper level outcome is a random variable \(\varPhi [u \odot \Upsilon ] \in L^0(\Omega , \mathcal {B}, \mathbb {P})\) for any fixed u by Proposition 1. Here and in the subsequent analysis, we denote the associated classical \(L^p\)-spaces with \(p \in [1,\infty ]\) by \(L^p(\Omega , \mathcal {B}, \mathbb {P})\) and use \(L^0(\Omega , \mathcal {B}, \mathbb {P})\) for the space of real-valued measurable functions.

Theorem 1

Assume (A1)-(A4), then the mapping \(\mathbb {F}: \mathcal {U}_\Upsilon \rightarrow L^\infty (\Omega , \mathcal {B}, \mathbb {P})\) given by

$$\begin{aligned} \mathbb {F}[u] \, {:}{=}\, \varPhi \left[ u \odot \Upsilon \right] \end{aligned}$$

is well-defined and continuous with respect to any \(L^p\)-norm with \(p \in [1,\infty )\).

The proof of Theorem 1 requires some preliminary work.

Lemma 2

Assume (A3) and (A4), then the set of discontinuities of \(\varPhi \) is a Lebesgue null set.

Proof

As the lower level goal function is strictly convex, we have \(\varPsi [u] \subseteq \mathcal {P}\) for any \(u \in \mathcal {U}\). By (A4), for any pair \((f,{\tilde{f}}) \in \mathcal {P} \times \mathcal {P}\) the function \(G_{(f,\tilde{f})}: \tilde{\mathcal {U}} \rightarrow \mathbb {R}\) defined by

$$\begin{aligned} G_{(f,\tilde{f})}[u] \,{:}{=}\, f^\top M H[u]^{-1} M f - \tilde{f}^\top M H[u]^{-1} M \tilde{f} \end{aligned}$$

is well-defined and real analytic. Consequently, the set

$$\begin{aligned} B[f,\tilde{f}] \, {:}{=}\,\left\{ u \in \mathcal {U} \mid f, \tilde{f} \in \varPsi [u] \right\} \subseteq \left\{ u \in \mathcal {U} \mid G_{(f,\tilde{f})}[u] = 0 \right\} \end{aligned}$$

of parameters for which f and \(\tilde{f}\) are optimal for the lower level problem is a Lebesgue null set, or we have

$$\begin{aligned} G_{(f,\tilde{f})}[u] = 0 \end{aligned}$$

for any \(u \in \mathcal {U}\) by [30, Proposition 1].

Now we start from the case that \(B[f,\tilde{f}]\) is a Lebesgue null set for any \(f, \tilde{f} \in \mathcal {P}\) satisfying \(f \ne \tilde{f}\). Let \(u\in \mathcal {U}\). If \(\varPsi [u]\) is a singleton, then by Proposition 1, \(\varPhi \) is continuous at u. Consequently, the set of discontinuity points of \(\varPhi \) is contained in

$$\begin{aligned} \bigcup _{f, \tilde{f} \in \mathcal {P}, \; f \ne \tilde{f}} B[f,\tilde{f}], \end{aligned}$$

which is a Lebesgue null set by the previous considerations.

To take care of the general case, let us consider the following relation on \(\mathcal {P} \times \mathcal {P}\):

$$\begin{aligned} f \sim \tilde{f} :\Leftrightarrow G_{(f,\tilde{f})}[u] = 0 \quad \text {for any}\ u \in \mathcal {U}. \end{aligned}$$

It is easy to verify that \(\sim \) defines an equivalence relation and that the equivalence class of any extreme point \(\tilde{f} \in \mathcal {P}\) is given by

$$\begin{aligned} E[\tilde{f}] \, {:}{=}\,\left\{ f \in \mathcal {P} \mid G_{(f,\tilde{f})}[u] = 0 \quad \forall \,u \in \mathcal {U} \right\} . \end{aligned}$$

By (6), \(E[{\tilde{f}}]\subseteq \varPsi [u]\) if \({\tilde{f}}\in \varPsi [u]\cap \mathcal {P}\). Let \(\tilde{\mathcal {P}} \subseteq \mathcal {P}\) contain exactly one element from each equivalence class, then \(\varPhi \) admits the representation

$$\begin{aligned} \varPhi [u] = \max _{\tilde{f} \in \tilde{\mathcal {P}}\cap \varPsi [u]} \left\{ \max _{f \in E[\tilde{f}]} J[u,f] \right\} . \end{aligned}$$

As \(\mathcal {P}\) is finite, for any \(\tilde{f} \in \tilde{\mathcal {P}}\) the mapping

$$\begin{aligned} u \mapsto \max _{f \in E[\tilde{f}]} J[u,f] \end{aligned}$$

is continuous. By the same argument as in the proof of Proposition 1, \(\varPhi \) is continuous on each set

$$\begin{aligned} S[\tilde{f}] \,{:}{=}\, \left\{ u \in \mathcal {U} \mid \{ \tilde{f} \} = \varPsi [u] \cap \tilde{\mathcal {P}} \right\} \end{aligned}$$

of parameters for which \(\tilde{f}\) is the only representative that is optimal for the lower level problem. Thus, the set of discontinuities of \(\varPhi \) is contained in the set

$$\begin{aligned} N_B \,{:}{=}\,\bigcup _{f, \tilde{f} \in \tilde{\mathcal {P}}, \; f \ne \tilde{f}} B[f,\tilde{f}], \end{aligned}$$

which is a Lebesgue null set by construction of \(\tilde{\mathcal {P}}\). For later reference we remark that we obtained

$$\begin{aligned} \mathcal {U} = N_B \cup \bigcup _{\tilde{f} \in \tilde{\mathcal {P}}} S[{\tilde{f}}] \quad \text {with}\ \mathcal {L}^n(N_B)=0, \end{aligned}$$
(8)

and that the sets \(N_B\) and \(S[{\tilde{f}}]\) for \(\tilde{f} \in \tilde{\mathcal {P}}\) in the right-hand side of (8) are pairwise disjoint. \(\square \)

Throughout the subsequent analysis, we will use the notation introduced in the proof of Lemma 2.

Proof of Theorem 1

Let \(u \in \mathcal {U}_\Upsilon \). As any upper semicontinuous function is Borel measurable, \(\mathbb {F}[u] \in L^0(\Omega , \mathcal {B}, \mathbb {P})\) follows directly from Proposition 1. Moreover, we have

$$\begin{aligned} \mathrm {ess \; sup} \, \mathbb {F}[u] \le \max _{f \in \mathcal {F}} \sup _{\upsilon \in \mathrm {supp} \, \mu _{\Upsilon }} J[u \odot \upsilon ,f] < \infty \end{aligned}$$
(9)

by continuity of J, (A1) and (A3).

Consider any sequence \(\lbrace u_k \rbrace _{k \in \mathbb {N}} \subseteq \mathcal {U}_\Upsilon \) that converges to some \(u \in \mathcal {U}_\Upsilon \). We write

$$\begin{aligned} \lim _{k \rightarrow \infty } \left\| \mathbb {F}[u] - \mathbb {F}[u_k] \right\| _{L^p(\Omega , \mathcal {B}, \mathbb {P})}^p = \lim _{k \rightarrow \infty } \int _{\mathrm {supp} \, \mu _{\Upsilon }} \left| \varPhi [u \odot \upsilon ] - \varPhi [u_k \odot \upsilon ] \right| ^p~\mu _{\Upsilon }(d\upsilon ) . \end{aligned}$$

The set \(\{u\}\cup \{u_k \mid k\in \mathbb {N}\}\) is compact, so that by continuity of J and (9) we obtain a uniform bound on the integrand. With (A1) and dominated convergence we see that it suffices to prove pointwise convergence almost everywhere.

Let \(N_B\) be as in the proof above, and consider the set

$$\begin{aligned} {\hat{N}}_B \,{:}{=}\,\left\{ \upsilon \in \mathcal {U}_\Upsilon \mid u\odot \upsilon \in N_B \right\} . \end{aligned}$$

By the change-of-variables formula we obtain \(0=\mathcal {L}^n( N_B)=\prod _{i=1}^n u_i \mathcal {L}^n({\hat{N}}_B)\) and, since \(u_i>0\) for all i, \(\mathcal {L}^n({\hat{N}}_B)=0\). By (A2), \(\mu _\Upsilon ({\hat{N}}_B)=0\).

Fix some \(\upsilon \in \mathcal {U}_\Upsilon \setminus {\hat{N}}_B\). Then by (8) we have \(u\odot \upsilon \in S[{\tilde{f}}]\) for some \({\tilde{f}}\in \tilde{\mathcal {P}}\), so that in particular \(\varPhi \) is continuous at \(u\odot \upsilon \). From \(u_k\rightarrow u\) with \(k \rightarrow \infty \) we obtain \(u_k\odot \upsilon \rightarrow u\odot \upsilon \) and therefore \(\varPhi [u_k \odot \upsilon ] - \varPhi [u \odot \upsilon ]\rightarrow 0\). This proves pointwise convergence almost everywhere and concludes the proof. \(\square \)

Remark 3

The assertion of Theorem 1 does not hold for \(p=\infty \). To see this, we consider the example of Remark 1 and extend it to the stochastic setting taking \(\Omega =[\frac{9}{10},\frac{11}{10}]\), \({\mathbb {P}}\) proportional to the Lebesgue measure restricted to \(\Omega \), and \(\Upsilon \) to be the identity, so that \(\mathrm {supp}\, \mu _\Upsilon =\Omega \). Then \({\mathbb {F}}[u](v)=\varPhi [uv]=\pm uv\), with the positive sign if and only if \(uv\ge 1\). For the sequence \(u_k:=1-\frac{1}{k}\rightarrow 1\) we have \({\mathbb {F}}[u_k](v)=u_kv\) for \(v\ge 1/u_k\), and \({\mathbb {F}}[u_k](v)=-u_kv\) for \(v< 1/u_k\). In particular for all \(v\in (1, 1/u_k)\) we have \({\mathbb {F}}[u_k](v)-{\mathbb {F}}[1](v)=-u_kv-v\). Taking the supremum over all such v we obtain \(\Vert {\mathbb {F}}[u_k]-{\mathbb {F}}[1]\Vert _{L^\infty (\Omega ,{\mathcal {B}},{\mathbb {P}})}\ge 1+\frac{1}{u_k}\rightarrow 2\), hence \({\mathbb {F}}[u_k]\) does not converge to \({\mathbb {F}}[1]\) in \(L^\infty (\Omega ,{\mathcal {B}},{\mathbb {P}})\).

As a generic first choice, the leader might assess the random upper level cost based on its expected value, i.e. consider the risk neutral bilevel stochastic program

$$\begin{aligned} \min _{u \in \mathcal {U}_\Upsilon } \left\{ \mathbb {E} \left[ \mathbb {F}[u] \right] \right\} , \end{aligned}$$
(10)

which is well-defined by Theorem 1. More in general, to allow for varying degrees of risk aversion, we take into account a mapping \(\mathcal {R}: \mathcal {X} \rightarrow \mathbb {R}\) with

$$\begin{aligned} L^\infty (\Omega , \mathcal {B}, \mathbb {P}) \subseteq \mathcal {X} \subseteq L^0(\Omega , \mathcal {B}, \mathbb {P}) \end{aligned}$$

and consider the bilevel stochastic program

$$\begin{aligned} \min _{u \in \mathcal {U}_\Upsilon } \left\{ \mathcal {R}\left[ \mathbb {F}[u] \right] \right\} . \end{aligned}$$
(11)

\(\mathcal {R}\) will typically be a monetary risk measure in the sense of [15, Definition 4.1] meaning it satisfies the following conditions:

  • Monotonicity: \(\mathcal {R}[Y_1] \le \mathcal {R}[Y_2]\) for all \(Y_1, Y_2 \in \mathcal {X}\) satisfying \(Y_1 \le Y_2\) \(\mathbb {P}\)-almost surely.

  • Translation equivariance: \(\mathcal {R}[Y + m] = \mathcal {R}[Y] + m\) for all \(Y \in \mathcal {X}\) and \(m \in \mathbb {R}\).

Moreover, we will assume the following:

  1. (A5)

    \(\mathcal {R}: L^p(\Omega , \mathcal {B}, \mathbb {P}) \rightarrow \mathbb {R}\) with some \(p \in [1,\infty )\) is convex and nondecreasing as defined above.

Remark 4

(A5) holds for any convex risk measure in the sense of [13] and [14], i.e. for any monetary risk measure that is convex. In particular, this includes the expectation, the mean-upper semideviation of any order and the Conditional Value-at-Risk. However, as we do not assume translation equivariance, the assumption is also fulfilled for the expected excess of arbitrary order (cf. [35, Chapter 6]).

The following result is well-known in the literature, see for example [4, Theorem 4.1]. For the convenience of the reader, we provide a short self-contained proof.

Lemma 3

Assume (A5), then the mapping \(\mathcal {R}\) is continuous.

Proof

For \(f\in L^p(\Omega , \mathcal {B}, \mathbb {P})\) we denote by \(|f|\in L^p(\Omega , \mathcal {B}, \mathbb {P})\) the function obtained taking the pointwise absolute value, so that \(f\le |f|\), \(-f\le |f|\) \(\mathbb {P}\)-almost everywhere.

It suffices to prove that \(\mathcal {R}\) is continuous in 0, and we can assume that \(\mathcal {R}(0) = 0\) (otherwise we replace \(\mathcal {R}\) by \(\hat{\mathcal {R}}(f) {:}{=}\mathcal {R}(g_*+f)-\mathcal {R}(g_*)\)). If \(\mathcal {R}\) is not continuous, there is \(\delta >0\) such that for any j there is \(f_j\in L^p(\Omega , \mathcal {B}, \mathbb {P})\) with \(\Vert f_j\Vert _{L^p(\Omega , \mathcal {B}, \mathbb {P})}<4^{-j}\) and \(|\mathcal {R}(f_j)|\ge \delta \). By convexity, \(0=\mathcal {R}(0)\le \frac{1}{2} \mathcal {R}(f_j)+\frac{1}{2} \mathcal {R}(-f_j)\), which implies \(\mathcal {R}(-f_j)\ge -\mathcal {R}(f_j)\). By monotonicity,

$$\begin{aligned} \mathcal {R}(|f_j|) \ge \max \left\{ \mathcal {R}(f_j), \mathcal {R}(-f_j)\right\} \ge \max \left\{ \mathcal {R}(f_j), -\mathcal {R}(f_j)\right\} = \left| \mathcal {R}(f_j)\right| \ge \delta . \end{aligned}$$

Let \(f_* {:}{=}\sum _j 2^j |f_j|\in L^p(\Omega , \mathcal {B}, \mathbb {P})\). Using first monotonicity and then convexity, we obtain \(\mathcal {R}(f_*)\ge \mathcal {R}(2^j|f_j|) \ge 2^j \mathcal {R}(|f_j|) \ge 2^j\delta \) for any j, which contradicts the boundedness of \(\mathcal {R}(f_*)\). \(\square \)

Theorem 2

Assume (A1)–(A5), then the function \(\mathcal {Q}_\mathcal {R} : \mathcal {U}_\Upsilon \rightarrow \mathbb {R}\) defined by

$$\begin{aligned} \mathcal {Q}_\mathcal {R}[u] \,{:}{=}\, \mathcal {R}\left[ \mathbb {F}[u] \right] = \mathcal {R} \left[ \varPhi [u \odot \Upsilon ] \right] \end{aligned}$$

is continuous. In particular, the bilevel stochastic problem (11) has an optimal solution whenever the induced feasible set \(\mathcal {U}_\Upsilon \) is nonempty and compact.

Proof

As \(\mathcal {R}\) is continuous by Lemma 3, the result follows from Theorem 1. \(\square \)

Let us now consider the stochastic version of the modified problem (7), where the leader hedges against all \(\eta \)-optimal lower level solutions. For this, we will use the notion of law-invariant risk measure:

$$\begin{aligned} \mathcal {R}[Y_1] = \mathcal {R}[Y_2]\quad \text {for all}\ Y_1, Y_2 \in \mathcal {X}\ \text {with}\ \mathbb {P} \circ Y_1^{-1} = \mathbb {P} \circ Y_2^{-1}, \end{aligned}$$

i.e. for all \(Y_1\), \(Y_2\) which induce the same Borel probability measure. The following existence result is obtained for law-invariant, convex risk measures under weaker assumptions, where we no longer restrict the analysis to polyhedral \(\mathcal {F}\) and real analytic H.

Theorem 3

Assume (A1) and (A5) and let \(\mathcal {R}\) be translation equivariant as well as law-invariant. Then the mapping \(\mathcal {Q}_{\mathcal {R}, \eta }: \mathcal {U}_\Upsilon \rightarrow \mathbb {R}\) given by

$$\begin{aligned} \mathcal {Q}_{\mathcal {R}, \eta }[u] \,{:}{=}\,\mathcal {R} \left[ \varPhi _\eta \left[ u \odot \Upsilon \right] \right] \end{aligned}$$

is well-defined and lower semicontinuous. In particular, the bilevel stochastic program

$$\begin{aligned} \min _{u \in \mathcal {U}_\Upsilon } \left\{ \mathcal {Q}_{\mathcal {R}, \eta }[u] \right\} \end{aligned}$$

is solvable, whenever \(\mathcal {U}_\Upsilon \) is nonempty and compact.

Proof

First, note that \(\varPhi _\eta \left[ u \odot \Upsilon \right] \in L^0(\Omega , \mathcal {B}, \mathbb {P})\) and

$$\begin{aligned} \left\| \varPhi _\eta \left[ u \odot \Upsilon \right] \right\| _{L^\infty (\Omega , \mathcal {B}, \mathbb {P})} \le \sup _{\upsilon \in \mathrm {supp} \, \mu _\Upsilon } \varPhi _\eta \left[ u \odot \upsilon \right] \le \sup _{\upsilon \in \mathrm {supp} \, \mu _\Upsilon } \sup _{f\in \mathcal {F}} J[u \odot \upsilon ,f] < \infty \end{aligned}$$

hold for any \(u \in \mathcal {U}_\Upsilon \) by Proposition 2 and (A1). Thus, \(\mathcal {Q}_{\mathcal {R}, \eta }\) is well-defined.

Let \(\mathcal {R}^*\) denote the convex conjugate of \(\mathcal {R}\) (cf. [22, Theorem 2.1]), then \(\mathcal {R}\) admits a robust representation as

$$\begin{aligned} \mathcal {R}[\mathcal {Y}] = \sup _{\mathbb {P}' \in \mathrm {Env}} \left\{ \mathbb {E}_{\mathbb {P}'}[\mathcal {Y}] - \mathcal {R}^*[\mathbb {P}'] \right\} \quad \forall \,\mathcal {Y} \in L^p(\Omega , \mathcal {B}, \mathbb {P}), \end{aligned}$$

where the risk envelope \(\mathrm {Env}\) is a subset of the normed positive part of the dual space of \(L^p(\Omega ,\mathcal {B},\mathbb {P})\) by [22, Corollary 2.3, Theorem 2.4]. Fix any \(\mathbb {P}' \in \mathrm {Env}\). With a slight abuse of notation, we shall identify \(\mathbb {P}'\) with the \(\mathbb {P}\)-continuous probability measure \(\mathrm {d}\mathbb {P}'/\mathrm {d}\mathbb {P}\) and show that the mapping \(u \mapsto \mathbb {E}_{\mathbb {P}'} \left[ \varPhi _\eta \left[ u \odot \Upsilon \right] \right] \) is lower semicontinuous. The result then follows because the pointwise supremum of lower semicontinuous functions is lower semicontinuous (cf. Lemma 1 and [34, Proposition 1.26 (a)]).

Consider any sequence \(\lbrace u_k \rbrace _{k \in \mathbb {N}} \subseteq \mathcal {U}_\Upsilon \) that converges to some \(u \in \mathcal {U}_\Upsilon \). Without loss of generality, we assume that \(u_k \in B_1(u) \cap \mathcal {U}_\Upsilon \) holds for any \(k \in \mathbb {N}\), where \(B_1(u)\) denotes the open Euclidean unit ball around u (We denote its closure by \(\overline{B}_1(u)\)). By definition,

$$\begin{aligned} \varPhi _\eta \left[ u_k \odot \Upsilon \right] \ge \min _{\upsilon \in \mathrm {supp} \mu _\Upsilon } \min _{u \in \overline{B}_1(u) \cap \mathcal {U}_\Upsilon } \min _{f \in \mathcal {F}} J \left[ u \odot \upsilon ,f \right] =: \underline{J} \end{aligned}$$

holds for any \(k \in \mathbb {N}\). As J is continuous and \(\mathrm {supp} \, \mu _\Upsilon \), \(\overline{B}_1(u) \cap \mathcal {U}_\Upsilon \), and \(\mathcal {F}\) are nonempty and compact, we have \(\underline{J} \in \mathbb {R}\). Thus, Fatou’s Lemma yields

$$\begin{aligned} \liminf _{k \rightarrow \infty } \mathbb {E}_{\mathbb {P}'}\left[ \varPhi _\eta \left[ u_k \odot \Upsilon \right] \right] \ge \int _{\Omega } \liminf _{k \rightarrow \infty } \varPhi _\eta \left[ u_k \odot \Upsilon \right] ~\mathbb {P}'(d\omega ) \ge \mathbb {E}_{\mathbb {P}'}\left[ \varPhi _\eta \left[ u \odot \Upsilon \right] \right] , \end{aligned}$$

which completes the proof. \(\square \)

4 Application: discrete shells

In this section, we will apply bilevel optimization to a mechanical shape optimization problem. Our aim is to determine the optimal elastic design of curved roof-type constructions. The leader in this setup is the construction engineer who aims at minimizing a tracking-type functional via optimizing the distribution of material on a prescribed roof geometry. Due to production errors, the material distribution is considered to be stochastically perturbed in the actual construction phase. The follower is a test engineer, who is performing a worst-case analysis and considers within a given set of possible forces—for example wind and roof load—those that maximize the compliance functional.

4.1 General setting and problem formulation

Our model problem is taken from the literature on geometric design [38], but our mechanical perspective is not self-supporting structures but instead architectural structures composed of discrete thin shells. Indeed, we model the mechanical properties of a roof construction using an adaptation of the discrete elastic shell model by Grinspun et al. [17], in which the geometry is a triangular surface and each triangle is considered as a construction panel, with joints at the edges. The membrane distortion deforms the individual panels, whereas the bending distortion leads to a change of the dihedral angle between pairs of panels that share an edge. Let us emphasize that the discrete shell approach is a design tool and does not act as a computational tool for the full elastostatic modeling in a later planning stage. In fact, we consider the discrete shell model mainly as a testbed for the proposed bilevel optimization approach. We underline this by reporting all physical quantities without units.

Comparing with the notation in the previous section, the design parameter u will represent the thickness of the shell, f the applied forces, and y the resulting displacement of the shell. The minimization in Eq. (3) then corresponds to the solution of a linear elasticity problem in (13), with H[u] representing the elastic energy. The problem in (2) corresponds to the follower optimizing compliance. The leader’s cost functional J in (1) measures the deviation from the prescribed shape and is defined in (14) below.

We consider the simplicial mesh of a discrete shell \(\mathcal {S}_h = (\mathcal {V}, \mathcal {E}, \mathcal {T})\) consisting of sets of vertices \(\mathcal {V}\), edges \(\mathcal {E}\subset \mathcal {V}\times \mathcal {V}\) and triangular faces \(\mathcal {T}\subset \mathcal {V}\times \mathcal {V}\times \mathcal {V}\). In what follows, we use maps defined on the different elements of such a mesh instead of vectors used in the theoretical considerations above. For example, a map \(w : \mathcal {V}\rightarrow \mathbb {R}^k\) assigning each vertex a value in this section corresponds to a vector \(\mathbb {R}^{k{|\mathcal {V}|}}\) from the previous sections and similarly for functions defined on edges and faces. We denote evaluations \(w(v)\) of such a map also via indexing to simplify notation, i.e. \(w_v{:}{=}w(v) \in \mathbb {R}^k\).

The geometry of a discrete shell is given by a map \(x: \mathcal {V}\rightarrow \mathbb {R}^3\) subject to the constraint that for each face there is no straight line in \(\mathbb {R}^3\) containing all three vertices, i.e. no triangle degenerates to a line. Thus, each triangle \(t\in \mathcal {T}\) with vertices \(v_0\), \(v_1\), \(v_2\) can be parametrized over the reference triangle in \(\mathbb {R}^2\) with vertices (0, 0), (1, 0) and (0, 1) via the affine map \(x_t\) interpolating \(x(v_0)\), \(x(v_1)\), \(x(v_2)\). We denote by \(Dx_t\) the differential of this affine map for face \(t\), so that the associated metric tensor in the same face is

$$\begin{aligned} G[x_t] \,{:}{=}\,( Dx_t)^\top Dx_t. \end{aligned}$$

We denote by \({\hat{x}}:\mathcal {V}\rightarrow \mathbb {R}^3\) the fixed stress-free reference configuration of the discrete shell, and parametrize the deformed configuration \(x= {\hat{x}}+y\) in terms of the elastic displacement of the vertices \(y: \mathcal {V}\rightarrow \mathbb {R}^3\). We denote by \({l}_e\) the length of an edge \(e\in \mathcal {E}\) and by \({a}_t\) the area of a face \(t\in \mathcal {T}\) in the reference configuration. Then, \({a}_e{:}{=}\tfrac{1}{3}({a}_t+ {a}_{t'})\) is a corresponding edge-associated area, where \(t\) and \(t'\) are the two faces adjacent to the interior edge \(e\in \mathcal {E}\); correspondingly \({a}_v{:}{=}\frac{1}{3} \sum _{t\in \mathcal {T}_v} {a}_t\) a vertex-associated area for the ring of faces \(\mathcal {T}_v\) around a vertex \(v\in \mathcal {V}\).

The design variable is the material thickness parameter, which is assumed to be constant on each of the triangles and is denoted by \(u: \mathcal {T}\rightarrow (0,\infty )\). In order to evaluate the bending contribution to the energy, see (12) below, we shall use on an interior edge \(e\) the averaged thickness \(u_e{:}{=}\tfrac{1}{2}(u_t+ u_{t'})\) of the two triangles \(t\) and \(t'\) sharing the edge \(e\).

Variational Formulation of Discrete Shells. In the modeling of thin shells, the elastic stored energy is typically the sum of two terms: the stored energy caused by in-plane membrane distortion and the stored energy reflecting bending distortion [7, 27]. The two terms scale linearly and cubically, respectively, in the thickness of the shell.

For a displacement \(y\), the Cauchy-Green strain tensor measuring the change of lengths, and consequently area, of a face \(t\) is given by

$$\begin{aligned} \mathcal {G}[y] \,{:}{=}\, \left( G[{\hat{x}}_t] \right) ^{-1} G[({\hat{x}}+y)_t]. \end{aligned}$$

Then, the membrane energy depends on this tensor and is defined as

$$\begin{aligned} \mathcal {W}_{\text {{\tiny mem}}}[u,y] \,{:}{=}\, \sum _{t\in \mathcal {T}} {a}_t\, u_t\, W_{\text {{\tiny mem}}}(\mathcal {G}[y]\vert _t), \end{aligned}$$

where we use the neo-Hookean energy density

$$\begin{aligned} W_{{\text {{\tiny mem}}}}(A) \,{:}{=}\, \frac{\mu }{2}\mathrm {tr}\,A + \frac{\lambda }{4}\det A -\left( \mu +\frac{\lambda }{2}\right) \log \det A - \mu - \frac{\lambda }{4}. \end{aligned}$$

The linearization of this energy coincides with the planar, isotropic, linearized elasticity model with Lamé-Navier coefficients \(\mu \) and \(\lambda \) [6, 27]. In the following, we use \(\mu = \lambda = 1\).

For the bending energy, we follow [19] and use an adaptation of the discrete shell bending energy introduced in [17]. It measures the change of the dihedral angles between a pair of neighboring triangles \(t\) and \(t'\) due to the displacement \(y\) in the configuration \(x\). The angle is computed as \({\theta }_e(x) {:}{=}\arccos ( n_{t}(x)^\top n_{t'}(x) )\), where \(n_{t}(x)\) and \(n_{t'}(x)\) are the unit normals generated by the deformation x, and the energy takes the form

$$\begin{aligned} \mathcal {W}_{\text {{\tiny bend}}}[u,y] \,{:}{=}\, \gamma \sum _{e\in \mathcal {E}} u_e^3 \cdot \frac{({\theta }_e({\hat{x}}+y) - {\theta }_e({\hat{x}}))^2}{{a}_e}{l}_e^2 \end{aligned}$$
(12)

for some constant \(\gamma >0\), which in continuum models can be expressed in terms of \(\lambda \) and \(\mu \). We use \(\gamma =1\).

The stored elastic energy \(\mathcal {W}[u,y]\) is the sum of these two energies,

$$\begin{aligned} \mathcal {W}[u,y] \,{:}{=}\, \mathcal {W}_{\text {{\tiny mem}}}[u,y] + \mathcal {W}_{\text {{\tiny bend}}}[u,y], \end{aligned}$$

so that the total free energy in the presence of external forces \({f}: \mathcal {V}\rightarrow \mathbb {R}^{3}\) reads as

$$\begin{aligned} \mathcal {I}[u, {f},y] = \mathcal {W}[u,y] - {f}^\top My, \end{aligned}$$
(13)

where \(M\) is a diagonal mass matrix in \(\mathbb {R}^{3{|\mathcal {V}|}\times 3{|\mathcal {V}|}}\) with entries \({a}_v\) at positions (ii) with \(i=3j-k\) for \(j=1,\ldots , {|\mathcal {V}|}\) and \(k=0,1,2\). The elastic displacements resulting from applying the forces to the reference configuration are the minimizers of this energy.

In what follows, we restrict ourselves to the linearization of this model. We denote by \(H[u] {:}{=}\partial ^2_{yy} \mathcal {W}[u,0]\) the Hessian of the stored elastic energy, and obtain the linearized stored elastic energy

$$\begin{aligned} \mathcal {W}^{\mathrm {lin}}[u,y] \,{:}{=}\, \frac{1}{2} y^\top H[u] y\end{aligned}$$

as well as the linearized total free energy

$$\begin{aligned} \mathcal {I}^{\mathrm {lin}}[u, {f}, y] \,{:}{=}\, \mathcal {W}^{\mathrm {lin}}[u,y] - {f}^\top My, \end{aligned}$$

whose minimization corresponds to the innermost problem introduced in (3). Prescribing suitable boundary data \(y_v=0\) on a set of at least three vertices \(v\in \mathcal {V}\), which do not lie on a line, one can deduce (cf. [19]) that \(H[u]\) is a positive-definite matrix. As written above expression (4), for every \(u\) and \({f}\) the energy \(\mathcal {I}^{\mathrm {lin}}[u, {f}, \cdot ]\) has a unique minimizer, which is also the unique solution of the associated Euler-Lagrange equation

$$\begin{aligned} 0 = \partial _y\mathcal {I}^{\mathrm {lin}}[u, {f}, y] = H[u] y- M{f}. \end{aligned}$$

The Optimization Problem. To complete our practical optimization problem, we need to specify the admissible set of material parameters \(\mathcal {U}\), the admissible set of force parameters \(\mathcal {F}\), and the cost functional of the leader \(J\). The objective of the lower level optimal value function \(\psi \) is already completely defined in (5) and equals the compliance functional evaluated for the displacement \(y[u, {f}]\), i.e.

$$\begin{aligned} \psi [u] = \underset{f \in \mathcal {F}}{\max } \left\{ {f}^\top My[u, {f}] \right\} = \underset{f \in \mathcal {F}}{\max } \left\{ {f}^\top MH[u]^{-1} M{f}\right\} . \end{aligned}$$

The admissible set of force parameters \(\mathcal {F}\) is assumed to consist of linear combinations of a small number of different load scenarios. We assume that the forces are of the type \(f = B F\), where \(F\in \mathbb {R}^d\) for some \(d \ll 3{|\mathcal {V}|}\) are the coefficients, and the columns \(B_j\) of the matrix \(B\in \mathbb {R}^{3{|\mathcal {V}|}\times d}\) are the basis of the d-dimensional subspace of forces. Therefore, each \(B_j\in \mathbb {R}^{3{|\mathcal {V}|}}\) represents a force distribution on the reference configuration \({\hat{x}}\) which is then scaled with \(F_j\in \mathbb {R}\), for \(j=1,\dots , d\). The components of these basis vectors could be determined, for example, from the location of the vertex or the inclination of the triangular faces sharing a vertex. Furthermore, we consider different constraints on the values of the scale factors \(F_j\), i.e. we assume that the set \(\mathcal {F}\) is given by \(\bigcap _{k=1}^K \mathcal {F}_k\) with

$$\begin{aligned} \mathcal {F}_k \,{:}{=}\, \left\{ BF\in \mathbb {R}^{3|\mathcal {V}|} \mid F\in \mathbb {R}^d,\, \mathcal {Q}^F_k(F) \ge 0 \right\} \end{aligned}$$

for some smooth functions \(\mathcal {Q}^F_k\) for \(k=1,\ldots , K\). For example, if \(\mathcal {F}\) consists of the forces which fulfill \(|F |\le \mu \) then one might choose \(d=3|\mathcal {V}|\), \(B=\mathrm {Id}\), \(K=1\), and \(\mathcal {Q}_1^F(F) = \mu ^2 - |F |^2\).

In the problem of the leader, we constrain the material thickness parameter \(u\) elementwise from below and from above, and we assume that the total volume of material, determined via the discrete integral of \(u\), is below some fixed positive parameter.

Lastly, the upper level cost functional is considered to be of tracking-type and measures the squared discrete \(L^2\)-norm of the displacement on a predefined tracking subset of the whole shell,

$$\begin{aligned} J[u,f] \,{:}{=}\, y[u,f]^\top \chi \odot M y[u,f] =\sum _{v\in \mathcal {V}} \chi _vM_{vv} \left| y_v[u,f] \right| ^2. \end{aligned}$$
(14)

Here \(\chi : \mathcal {V}\rightarrow \{0,1\}\) is a discrete characteristic function with value 1 at vertices in the tracking set and 0 elsewhere.

In the stochastic setting, we restrict ourselves to the expected value \(\mathbb {E}\left[ \mathbb {F}[u] \right] \) as the risk measure for the optimization (cf. (10)). Furthermore, the stochastic perturbation of the distribution of the thickness parameter \(u\) is given by i.i.d. normal distributions for each parameter, i.e. we consider the perturbed material \(u\odot \Upsilon \) for \(\Upsilon \sim {\mathcal {T}}{\mathcal {N}}(1,\sigma ^2,\upsilon _{{\text {\tiny min}}}, \upsilon _{{\text {\tiny max}}})^{|\mathcal {T}|}\), where \({\mathcal {T}}{\mathcal {N}}(1,\sigma ^2,\upsilon _{{\text {\tiny min}}}, \upsilon _{{\text {\tiny max}}})\) is the truncated normal distribution with average 1 and standard deviation \(\sigma \), truncated to the interval \([\upsilon _{{\text {\tiny min}}},\upsilon _{{\text {\tiny max}}}]\). In practice, we take \(\sigma \le 0.2\), \(\upsilon _{{\text {\tiny min}}}=10^{-2}\) and \(\upsilon _{{\text {\tiny max}}}=2\), so that the truncation has little effect and \(\sigma \) is almost identical to the standard deviation of \(\Upsilon \).

We further fix constants \(0<u^-<u^+\) and \(V^+>0\) and define implicitly \(\mathcal {U}\) by the condition

$$\begin{aligned} \mathcal {U}_\Upsilon = \left\{ u: \mathcal {T}\rightarrow \mathbb {R}\mid u^- \le u_t\le u^+ \quad \forall \, t\in \mathcal {T}, \, \sum _{t\in \mathcal {T}} {a}_tu_t\le V^+ \right\} \subset (0,\infty )^{|\mathcal {T}|}. \end{aligned}$$

4.2 Numerical optimization

To numerically solve the bilevel problem (1) in the presented setting, it is convenient to replace the restriction of u and f to admissible sets \(\mathcal {U}_\Upsilon \) and \(\mathcal {F}\) by smooth approximations and then to deal with a differentiable problem. In our implementation, we achieve this by using logarithmic barrier functions, as commonly used in interior point methods (see e.g. textbook [31]). Hence, with the structural assumptions on the set of admissible forces introduced above, we define the smoothed follower problem by

$$\begin{aligned} \varPsi _\alpha [u] \,{:}{=}\,\underset{F \in \mathbb {R}^{d}}{{{\,\mathrm{arg\,max}\,}}} \left\{ y[u, BF]^\top H[u]\, y[u, BF] + \alpha ^F \sum _{k=1}^K \log \left( \mathcal {Q}^F_k(F) \right) \right\} , \end{aligned}$$
(15)

where \(\alpha ^F > 0\) is an appropriate scaling factor for the barrier terms.

To compute the minimizers in (15), we do not aim at a global minimization approach but rather use an ascent method (see below) to compute isolated local minimizers. Thus, we assume in the numerical optimization of the leader problem, that the solution of the follower problem is of such type. This allows us to apply conventional nonlinear optimization algorithms. In this framework, the maximizer and the set \(\varPsi _\alpha \) be interchangeable. In the examples considered below, this assumption is justified by the use of asymmetric triangulations, and additionally by the symmetry-breaking random perturbations of the material thickness. Thus, the logarithmic barrier formulation of the expected value optimization problem for the leader is

$$\begin{aligned}&\underset{u\in \mathbb {R}^{|\mathcal {T}|}}{\min } \left\{ \mathbb {E} \left[ J \left[ u\odot \Upsilon ,\varPsi _\alpha [u\odot \Upsilon ] \right] \right] - \alpha ^u \sum _{t\in \mathcal {T}} {a}_t\left( \log (u_t-u^-) + \log (u^+ - u_t)\right) \right. \\&\left. \quad -\alpha ^V \log \left( V^+ - \sum _{t\in \mathcal {T}} {a}_tu_t\right) \right\} \end{aligned}$$

for scaling factors \(\alpha ^u, \,\alpha ^V >0\) as before.

This regular reformulation of the optimization problem can be solved numerically using a stochastic gradient method. For PDE-constrained shape optimization problems under uncertainty, this method is analyzed in [16]. In our case, the smoothed follower problem is a deterministic and smooth optimization problem, and computing its first and second derivatives is straightforward. Thus, we use a Newton-type method with Armijo backtracking line search (cf. [31, Algorithm 3.2]) to compute its optimizers. The gradients of the smoothed bilevel problem can be computed via the general procedure of shape optimization calculus and thus, we employ stochastic gradient descent [33] to solve it. To this end, in each iteration of the descent algorithm, we draw finitely many samples \(\upsilon ^1,\ldots , \upsilon ^K\) from the distribution of the material perturbation. In the experiments, we always chose \(K=128\). Using these samples, we approximate the expected value by the empirical risk \(\hat{J}[u] \,{:}{=}\, \tfrac{1}{K}\sum _{k=1}^K J \left[ u\odot \upsilon ^k,\varPsi _\alpha [u\odot \upsilon ^k] \right] \). Then a new iterate is computed by taking a step in the direction of the negative gradient of the combination of the empirical risk and the logarithmic barrier terms. Figure 1 depicts the decrease of the upper level cost functional over the iterations of the stochastic descent algorithm and the increase of the lower level compliance cost when solving the follower problem for the initial material distribution. Latter solves of the follower problem typically require 10 to 30 iterations of the Newton-type method per outer iteration.

We have implemented our method in C++ with the Geometric Optimization And Simulation Toolbox (GOAST) [20], where we use the Eigen library [18] for numerical linear algebra and CHOLMOD [3] from the SuiteSparse collection as direct linear solver. The code is available under https://gitlab.com/numod/bilevel-shape-optimization.

Fig. 1
figure 1

Left: upper level relative cost values \(\hat{J}[u^i]/\hat{J}[u^0]\) for the iterates of the stochastic gradient descent method in the example shown in the bottom row of Fig. 3. Right: corresponding lower level compliance cost \(\frac{y[u, BF^j]^\top H[u]\, y[u, BF^j]}{y[u, BF^0]^\top H[u]\, y[u, BF^0]}\) for iterates of the Newton-type method for the follower problem in the first upper level descent step and the initial material distribution

4.3 Numerical results

We applied the bilevel shape optimization method in a proof-of-concept study of discrete shells representing curved roofs. We fix an orientation so that the negative Z-axis is in the direction of gravity and the supporting ground is in the XY-plane. For each geometry, we fix a set of Dirichlet vertices near the ground plane, representing the points on which the structure is supported, and also fix the material thickness of the corresponding triangles. This removes these variables from the optimization.

The construction is exposed to two types of forces. First, there are forces emulating wind hitting the structure. For a given wind direction and strength, the force on each part of the roof depends on the local orientation. We assume that the magnitude of the force on a vertex is proportional to the absolute value of the scalar product between the vertex normal (given as the average of the normals of the triangles adjacent to the vertex) and the wind direction. For simplicity, we only consider a two-dimensional subset of possible forces, spanned by the basis vectors \(B_1\) and \(B_2\) which represent wind along the positive X- and Y-axis, respectively. The direction and magnitude of the wind are then controlled by the scale factors \(F_1\) and \(F_2\). We fix a maximal magnitude of wind-type force \(F_{\mathrm {max},xy}\) and use the constraint function \(\mathcal {Q}_1^F(F) \,{:}{=}\, F^2_{\mathrm {max},xy} - \left( F_1^2+ F_2^2\right) \) in (15). An example of these two basis vectors demonstrating the dependence on the orientation of the normal is shown in the second and third panels of Figure 2.

Fig. 2
figure 2

The first panel shows the geometry of the roof structure, with the tracking set on the roof plateau marked with dots. The Dirichlet nodes are the vertices on the horizontal plane at the corners. The other three panels show the three basis force fields \(B_1\) (horizontal wind in the X direction), \(B_2\) (horizontal wind in the Y direction) and \(B_3\) (vertical gravitational force caused by an overlay on the roof). The scale of the force arrows is arbitrary

Second, we consider a vertical force, which could emulate the weight of snow or water overlay on the roof. The magnitude of the corresponding basis vector \(B_3\) on each vertex is the absolute value of the scalar product between the vertex normal and the Z-axis and is shown in Fig. 2 on the far right. The magnitude of gravitational load is controlled by the scale factor \(F_3\), we ensure that it is pointing downward via \(\mathcal {Q}_2^F(F) \,{:}{=}\, F_3 \) and limit its magnitude via \(\mathcal {Q}_3^F(F) \,{:}{=}\, F_{\mathrm {max},z} - F_3\), where \(F_{\mathrm {max},z}\) is the maximal magnitude of the gravitational force. Therefore the admissible set \(\mathcal {F}\) is a cylinder with radius \(F_{\mathrm {max},xy}\) and height \(F_{\mathrm {max},z}\).

We performed most of our investigations on the simple roof geometry shown in Fig. 2. For this problem, the basic parameters, which are used in the examples if not indicated otherwise, are as follows. The roof geometry is almost filling a box of \(20 \times 20 \times 10\), the maximal horizontal load is \(F_{\mathrm {max},xy} = 0.0015\) and the vertical one \(F_{\mathrm {max},z} = 2 F_{\mathrm {max},xy}\). The elementwise bounds on the material thickness are \(u^- = 0.01\) and \(u^+ = 0.2\). The volume of the material is bounded by \(V^+ = 60\) and the strength of the stochastic variation is fixed by \(\sigma = 0.1\). The weights of the barrier terms were \(\alpha ^F = 10^{-4}\), \(\alpha ^u = 1\), and \(\alpha ^V = 10^{-5}\). For the leader, we consider a tracking set restricted to the central region of the roof plateau as shown in the first panel of Fig. 2.

Fig. 3
figure 3

Comparison of results for full vertex tracking set (top) and plateau tracking set (bottom) on the simple roof-type geometry already shown in Fig. 2. On the left, we show the deformed configurations as gray surfaces, while the undeformed surfaces are shown as translucent surfaces overlayed with red edges. Next to the surfaces, we visualize the direction of the force \((F_1,F_2,F_3)\) chosen by the follower in the cylinder of admissible values. In the middle, we show the resulting material distributions with color map

figure a
, where boundary triangles with all three vertices subject to Dirichlet boundary conditions are shown in gray. On the right, we show the magnitude of the deformation \(y\) using the color map
figure b
. Additionally, on the far right, we show the direction of the horizontal forces \((F_1,F_2)\)

In Fig. 3, we show the deformed configuration, the optimized distribution of the material thickness, and the magnitude of displacements in case of the leader minimizing a tracking functional once with global support (top row) using \(\chi \equiv 1\) and once restricted to the region of the roof plateau (bottom row). As for all examples presented here, in the follower problem, the maximal compliance is attained for a force F representing an extremal point of the cylinder of admissible forces. For the tracking cost domain centered on the roof plateau, one observes a concentration of mass in the central region accompanied by a significant reduction of the thickness close to the four corners where Dirichlet boundary conditions apply. The concentration and corresponding reduction break the symmetry of the configuration w.r.t. the diagonal from the upper left to the lower right. Due to the asymmetric reduction, the follower chooses a force pointing to the upper right and one observes a kink line connecting the two arcs in the front at approximately half of the total height. This is accompanied by large displacements, which are however outside of the tracking region on the plateau. In contrast, for the tracking with global support, no such kink with strong displacements occurs, however, the deformation exhibits a larger displacement in the central region. Finally, beyond the mass concentration in the middle, one also observes the onset of curved “beam” like structures connecting the middle region and the four arcs of the roof. In the example with localized tracking, and most of the following ones, the elementwise bounds \(u^+\) and \(u^-\) are nearly attained for at least some triangles.

Figure 4 shows for the same geometry the impact of the upper bound on the total material volume.

Fig. 4
figure 4

A comparison of the material distribution when varying the maximal allowed material volume \(V^+\) while keeping the other parameters fixed. The allowed volume was \(V^+= 40,50,60,70,80\) from left to right. Material thickness is shown using the color map

figure c
. On the far right, we show the direction of the horizontal forces, which was the same for all parameters, while the vertical force was always chosen maximal

As the total permitted mass is increased, the elongated curved “beams” connecting the tracking region in the center with the four arcs become thicker. Once the maximal thickness is reached in the central region and along these “beams”, further mass is invested to reinforce the regions close to the Dirichlet boundaries. The curved carrier “beams” and the central region are again designed asymmetrically w.r.t. the diagonal from the upper left to the lower right leading the follower to push towards the upper right.

We next investigate the effect of the parameters characterizing the strength of the forces, \({F_{\mathrm {max},z}}\) and \({F_{\mathrm {max},xy}}\), while keeping the total amount of material constant. By scaling invariance, it is natural to focus on the ratio \(\frac{F_{\mathrm {max},z}}{F_{\mathrm {max},xy}}\). In Fig. 5, we show that with increasing strength of the vertical force, the “beams” become thinner and instead more material is concentrated in the central region.

Fig. 5
figure 5

A comparison of the material distribution when varying the ratio of vertical to horizontal force \(\frac{F_{\mathrm {max},z}}{F_{\mathrm {max},xy}}\), i.e. the shape of the cylinder, while keeping the other parameters, especially the maximal magnitude of horizontal force, fixed. The ratio of vertical to horizontal force was \(\frac{F_{\mathrm {max},z}}{F_{\mathrm {max},xy}} = \frac{1}{2},1,2,4,8\) from left to right. The material thickness is shown using the color map

figure d
. On the right of each material distribution, we show the force in the cylinder of admissible values

Interestingly, for small values of the ratio between the two forces the material distribution is nearly symmetrical w.r.t. the diagonal from the upper left to the lower right, while it is asymmetric for mid-range ratios and then becomes more symmetric again for large ratios.

Figure 6 shows the impact of the strength of the stochastic perturbation of the material thickness, as measured by the standard deviation, again for the tracking region on the roof plateau.

Fig. 6
figure 6

Comparison of material distribution when varying the standard deviation \(\sigma \) of the material perturbation while keeping the other parameters fixed. The standard deviation was \(\sigma = \frac{5}{100}, \frac{1}{10}, \frac{2}{10}\) from left to right. Material thickness is shown using the color map

figure e

Fig. 7
figure 7

Results for two geometrically more complex examples. In both cases, we used tracking on the entire domain. On the left, we show the deformed configuration as a gray surface with the undeformed surfaces as a translucent overlay. Furthermore, we visualize the direction of the force leading to the maximal deformation in the cylinder. In the middle, we see the resulting material distributions using the color map

figure f
. Boundary triangles for which all vertices are Dirichlet nodes are shown in gray. On the right, the magnitude of the deformation \(y\) is displayed using the color map
figure g
. Additionally, on the far right, we show the 2D direction of the horizontal forces

With increasing strength of the stochastic perturbation, the optimal structure becomes more diffuse. Indeed, in a deterministic setting, the leader could aim for a finely-structured design, but very imprecise manufacturing is likely to render it ineffective. In order to understand this effect, we describe an idealized situation: If the leader concentrates mass on a single row of k elements, then a large negative fluctuation in the thickness of any single one of them is sufficient to destroy the strength of the construction. If instead, the leader distributes the mass on \(k^2\) elements filling a square, then at least a number of order k of those (with a specific geometry, for example, a column) must have a large negative fluctuation before the structure loses significantly in strength.

Lastly, in Fig. 7, we show two more complex examples of architectural designs of roof structures, inspired by [38]. In the top row, we use a closed hall as the reference geometry for our bilevel optimization problem, which fills a box of approximately \(20 \times 20 \times 5\). We limit the horizontal load with \(F_{\mathrm {max},xy} = 0.005\) and the vertical load with \(F_{\mathrm {max},z} = 2 F_{\mathrm {max},xy}\). The elementwise bounds on the material thickness are \(u^- = 0.01\) and \(u^+ = 0.2\). The volume of the material is bounded by \(V^+ = 50\) and the stochastic variation is \(\sigma = 0.05\). The weights of the barrier terms are \(\alpha ^F = 10^{-4}\), \(\alpha ^u = 1\), and \(\alpha ^V = 10^{-3}\). In the bottom row, we use a reference geometry resembling a double torus cut in half, which fills a box of approximately \(70 \times 50 \times 15\). Again, we limit the horizontal load with \(F_{\mathrm {max},xy} = 0.005\) and the vertical load with \(F_{\mathrm {max},z} = 2 F_{\mathrm {max},xy}\). The elementwise bounds on the material thickness are again \(u^- = 0.01\) and \(u^+ = 0.2\). The volume of the material is bounded by \(V^+ = 330\) and the stochastic variation is \(\sigma = 0.05\). The weights of the barrier terms are \(\alpha ^F = 10^{-4}\), \(\alpha ^u = 1\), and \(\alpha ^V = 10^{-1}\). In both cases, we use the full domain as tracking set. The main weakness of both structures is the concavity in the central part, which can be easily deformed by the vertical force. Hence, in both optimized solutions, the material is redistributed to prevent this. In the first case, this is done by building a stabilized ledge around the center, while in the second case beam-like structures from the two “holes” and another beam from the curve in the bottom emerge. Furthermore, in the second one, also the “entrance” is stabilized by adding material at the ends of its arch.

5 Discussion

The findings in this article draw a line from curved roof-type constructions via modeling and shape optimization of discrete thin shells to pessimistic formulations of bilevel stochastic programs. The challenge is that even in the deterministic case, it is well-known that standard compactness assumptions fail to ensure the existence of optimal solutions.

Assuming that the support of the underlying probability measure is compact, we have considered stochastic parameters and assessed the random upper-level outcome based on some (law-invariant) convex risk measure. For the pessimistic model, we have shown continuity of the resulting risk functional if the random perturbation admits a Lebesgue density, the set of potential forces is a polyhedron and the lower level goal function is real analytic. Alternatively, we have investigated a regularized model where the leader also hedges against lower level solutions that are close to optimality. The risk functionals emerging from this regularized problem are automatically lower semicontinuous. In both situations, the existence of optimal solutions can be guaranteed under a compactness condition. We have developed a proof-of-concept numerical implementation that applies a pessimistic bilevel strategy to a mechanical optimal design problem, using a stochastic gradient descent approach to compute locally optimal solutions of the pessimistic model.

In closing, we would like to point out several possible directions for future research. In the numerical optimization, it would be interesting to consider interior point methods to solve the “original” leader’s and follower’s problem which incorporate hard constraints instead of the regularization used here. From the point of view of elasticity, it would be interesting to study the nonlinear model (13) instead of its linearized equivalent and investigate the associated nonuniqueness issue in the lower level problem. In fact, this would lead to a proper trilevel problem and bring new challenges for theoretical and numerical investigations. Furthermore, it would be worthwhile to investigate the infinite-dimensional variational problem of thin shell or volume elasticity with appropriate function spaces and from the perspective of optimization with continuous PDE constraints. In the present paper, “risky” decisions can be penalized in the objective function, but the leader ensures that the perturbed material parameters are feasible regardless of the realization, via a restriction of the design variable to the set \(\mathcal {U}_\Upsilon \). Models, where this robust constraint is replaced with a system of chance or stochastic dominance constraints, can be expected to produce less conservative solutions, which improve the values of the leader’s cost functional at the cost of some residual risk.