Abstract
This paper presents a novel method for the solution of a particular class of structural optimzation problems: the continuous stochastic gradient method (CSG). In the simplest case, we assume that the objective function is given as an integral of a desired property over a continuous parameter set. The application of a quadrature rule for the approximation of the integral can give rise to artificial and undesired local minima. However, the CSG method does not rely on an approximation of the integral, instead utilizing gradient approximations from previous iterations in an optimal way. Although the CSG method does not require more than the solution of one state problem (of infinitely many) per optimization iteration, it is possible to prove in a mathematically rigorous way that the function value as well as the full gradient of the objective can be approximated with arbitrary precision in the course of the optimization process. Moreover, numerical experiments for a linear elastic problem with infinitely many load cases are described. For the chosen example, the CSG method proves to be clearly superior compared to the classic stochastic gradient (SG) and the stochastic average gradient (SAG) method.
Introduction and problem statement
In the following, we define the set of Lebesgue integrable functions mapping from the space X to space Y by L^{1}(X;Y ) and from the space X to the real numbers \(\mathbb {R}\) by L^{1}(X). The “dot” notation is used in the following way: g(⋅,y) denotes the mapping x↦g(x,y).
Our goal in this article is to develop a novel stochastic gradient method for the efficient solution of optimization problems of the general form:
Here, u is the design variable, which can be subject to constraints described by the set U_{ad}, and \(F:{U_{\text {ad}}} \mapsto \mathbb {R}\) is given as a composition of a functional \(J : L^{1}(V_{\text {ad}}) \mapsto \mathbb {R} \) and a function \(f : {U_{\text {ad}}} \times V_{\text {ad}} \mapsto \mathbb {R}\), where V_{ad} is a continuous parameter set. Throughout this paper, we further assume that the evaluation of the function f for any (u,v) ∈ U_{ad} × V_{ad} requires the solution of an underlying state problem, i. e., f is given in the form as follows:
where y(u;v) denotes the solution of the state problem parameterized by the design u and the additional continuous index variablev ∈ V_{ad}. As a consequence of this construction, an evaluation of the function F at a given design u theoretically requires the solution of infinitely many state problems.
In order to demonstrate that problem (1.1) has a broad range of applications, we give two particular examples for the choice of the functional J. In our first example, J is simply an integral over V_{ad}, resulting in the problem as follows:
The structure in (1.2) can arise in various settings. First, in an elastic setting, v could be a continuous load index and f(⋅,v) a compliance, displacement, or stress evaluation associated with the solution of the state problem with load index v ∈ V_{ad}. In this case, (1.2) would be a structural optimization problem with infinitely many load cases. For optimization problems with at least many load cases, we refer, e.g., Alvarez and Carrasco (2005) and Zhang et al. (2017), as well as the references therein. Furthermore, if for applications in acoustics (see, e.g., Dilgen et al. 2019) or optics (see, e.g., Jensen and Sigmund 2011), a state problem in timeharmonic form is considered, the parameter v can play the role of a frequency or wavelength. A prominent example for f in this context would be an L^{2}tracking function, such that (1.2) would describe the design of a device with a prescribed behavior over a continuous frequency range, such as the range of visible light, see Semmler et al. (2015). Another application in optics could be the optimization of an anisotropic object with respect to arbitrary illumination directions, again see Semmler et al. (2015). While we have so far looked at all these examples from a deterministic point of view, another important class of applications arises if we interpret F(u) as the expected value of a given property f associated with a design u. This immediately leads to the notion of reliabilitybased optimization problems (RBO), see, e.g., the overview article by Maute and Frangopol (2003), Conti et al. (2009) or De et al. (2019) and the references therein for a collection of more recent articles on this topic. In all these cases, the parameter v constitutes a realization of uncertainty (from a continuous uncertainty set V_{ad}), where the source of uncertainty can be, e.g., in loading, material properties, or stiffness.
Following this line of argumentation a little further, we come to the second instantiation of the generic problem (1.1), which is referred to as robust structural optimization problem according to De et al. (2019) and takes the form as follows:
Here, the expected value \(\mathbb {E}[f(u,\cdot )])\) is computed by the integral in (1.2), and λ is a positive parameter that denotes the importance of variations.
Having said this, we would like emphasize out that our paper is not the first one suggesting the application of stochasticgradienttype methods for the solution of structural optimization problems of the aforementioned type. Zhang et al. (2017) use the classical stochastic gradient (SG) method for the efficient solution of structural optimization with many (but finitely many) load cases. In De et al. (2019), robust optimization problems in the form of (1.3) are solved by the SG method and variants, as well as a stochastic version of the wellknown GCMMA framework, see Svanberg (2002). However, in this paper, a discrete set of scenarios is also assumed from the very beginning. More applications of the SG method can be found in the closely related area of inverse problems, see Haber et al. (2012) and RoostaKhorasani et al. (2014). These are structurally similar to the problems considered in this article, in the sense that each evaluation of the function f for a given scenario v, and also requires the potentially expensive solution of a discretized partial differential equation (PDE).
However, there are at least two substantial differences between all these references and the approach we suggest in this paper. Firstly, even though in some cases in the aforementioned articles a continuous index set V_{ad} is considered, an a priori selection of scenarios (or discretization of V_{ad}) is used. Secondly, even though in many applications the property function f depends on the index variable v with a certain regularity, this structure is ignored. In sharp contrast to this, we avoid an a priori discretization of integrals in this paper. This is principally due to the fact that a too coarse discretization can lead to artificial minima as will be demonstrated by means of a simple example in Section 2. Moreover, we would like to exploit the natural regularity of the property function f with respect to the index parameter v in order to design an efficient optimization algorithm in which the objective function F and its gradient are increasingly better approximated.
Beyond stochastic descent methods, robust problems of type (1.3) can also be approached by a combination of stochastic collocation methods combined with deterministic optimization solvers, see, e.g., Lazarov et al. (2012). However, also in this case, through the collocation, an approximation of the objective functional based on finitely many scenarios is chosen a priori.
To the best knowledge of the authors, the SG method itself is the only method which can, in principle, be applied to structural optimization problems with infinitely many state problems without relying on an a priori approximation of the objective functional. However, as it will be shown in the article, only the CSG method is able to successfully solve these problems.
Despite this, there are substantial structural similarities between our CSG method and the classic SG method and its relatives. Therefore, in the following, we briefly describe the basic SG idea.
The original SG method, see Robbins and Monro (1951), is a method frequently used to minimize functions of the form \(F:{\mathbb {R}}^{p} \to {\mathbb {R}},\) with
and \(f_{i}:= {\mathbb {R}}^{p} \mapsto {\mathbb {R}}\) for all i ∈{1,…,n}. Whereas conventional gradient methods calculate all n gradients ∇f_{i} in each iteration, the stochastic gradient method uses only a small random selection thereof. Hereby, for large n (as, e.g., in machine learning applications, see Bottou and Cun 2004), the computational effort can be drastically reduced in comparison to the classical gradient method. Bottou et al. (2018) and Schmidt et al. (2017) introduced an improved version of the SG algorithm, the socalled stochastic average gradient method (SAG). This benefits from previously calculated gradients, leading to a better approximation of the exact gradient. Thus, a better convergence behavior is typically observed. In this paper, the basic properties of the SG and the SAG method will be combined.

Low computational effort per iteration

Reuse of previously obtained information
We will integrate these two properties in our CSG algorithm and compare it with the SG and the SAG method by means of examples.
The remainder of this paper is structured as follows: We will first close this section with the formal statement of the problem. In Section 2, we introduce the proposed CSG algorithm by which we aim to solve the types of problems discussed. In Section 3, we then analytically study the convergence of the algorithm and state assumptions necessary for convergence. The main theoretical results are given in the two Theorems 19 and 20. Section 4 provides first numerical results of the CSG algorithm, as well as a comparison with the SG and SAG methods. Finally, in Section 5, we provide a summary and a brief outlook for further scientific work.
Formal statement of the problem
Generally, we look at the objective functionals as defined in (1.1), and we further assume the following:
Assumption 1 (Objective functional)
The reduced objective functional \(F(u):{U_{\text {ad}}} \mapsto {\mathbb {R}}\) is given by the composition of a mapping \( J : L^{1}(V_{\text {ad}} ; {\mathbb {R}}) \mapsto {\mathbb {R}} \) and \(f : {U_{\text {ad}}} \times V_{\text {ad}} \mapsto {\mathbb {R}}\). The Fréchet derivative of J will be denoted by its associated function \(D_{ J} := L^{\infty }(V_{\text {ad}} ; {\mathbb {R}}) \times V_{\text {ad}} \mapsto {\mathbb {R}}\). D_{J} and ∇_{u}f have to be Lipschitz continuous w.r.t. both their arguments in the respective topology.
For a definition of the Fréchet differential, see for instance (Jahn 2007, Def. 3.11.). With F chosen as in (1.1) and the latter assumption, we can state its derivative as follows:
Remark 2 (Derivative of F)
The derivative of F is then given by the following:
for all u ∈ U_{ad}, with D_{J} being the Fréchet differential as defined in Assumption 1.
The Fréchet derivative mentioned is exemplified in two relevant cases:
Remark 3 (Examples for Frechet derivativeś)
If J(f(u,⋅)) is the expected value of f w.r.t. the second argument, we obtain the following:
and for J(f(u,⋅)) being the variance of f with respect to the second argument, we obtain for all v ∈ V_{ad}, as follows:
As already mentioned, in the case of structural optimization, for each parameter v ∈ V_{ad}, the evaluation of f(⋅,v) requires the solution of an associated state problem. Consequently, the (approximate) evaluation of F and its gradient given in (1.4) is computationally very expensive. In general, it can be stated that the algorithm introduced in Section 2 is especially attractive for problems in which F is numerically expensive to evaluate due to its functional dependency.
One advantage of our algorithm in comparison to the SG and the SAG method is that no (a priori) quadrature rule is used to approximate the integral in (1.4). In this way, we later gain convergence of a subsequence to a stationary point of the continuous problem (1.1). Moreover, this helps to avoid artificial local minima, as will be outlined in the next section.
Continuous stochastic gradient method
Before presenting the new optimization algorithm, we will briefly discuss how discretization of the objective functional can introduce local minima, looking at the following function as follows:
By choosing \({U_{\text {ad}}} = [\frac {1}{2},\frac {1}{2}]\) and V_{ad} = (− 1,1), this corresponds to problem (1.1).
In Fig. 1, the graph of the function F in (2.1) is shown along with numerical approximations based on equidistant discretizations of the integral. Although the original function is convex, local minima are introduced due to the discretization of the integral. It is noted that without much information on the function f, it is hard to choose a suitable discretization, which would avoid this effect. Nevertheless, a good algorithm should be able to prevent convergence to such an artificial local minimum. In fact, this is one of the key features of the CSG algorithm introduced in detail in the following section.
The CSG algorithm
With \(\mathcal {P}_{U_{\text {ad}}}\) being the orthogonal projection onto U_{ad}, \(\lambda ^{d_{v}}\) being the Lebesgue measure in the d_{v}dimensional space, and
being the set of points that are closer to (u_{i},ω_{i}) than they are to (u_{j},ω_{j}) in the ∥⋅∥_{∗}norm given in Definition 10, we can state the proposed Algorithm 1:
Note that \({({{V}_{i}^{n}})}_{i = 1}^{n}\) defined in Algorithm 1 is a partition of V_{ad} for all \(n\in \mathbb {N}\), i.e., \({{V}_{i}^{n}} \cap {{V}_{j}^{n}} = \emptyset \) for all i,j ∈{0,…,n} with i≠j and \(V_{\text {ad}} = {\cup }_{i=0}^{n} \bar {{V}_{i}^{n}}\).
The CSG method as defined in Algorithm 1 is suitable for problems of the form (1.1) and is structured as most gradient descent methods. In each iteration n, a search direction \(\hat G_{n}\) (an approximation of the gradient ∇F_{n} := ∇F(u_{n})) is calculated, a step length τ_{n} > 0 is chosen and a sequence \((u_{n})_{n\in \mathbb {N}}\) is generated by the following:
Note that the existence and uniqueness of \(\mathcal {P}_{U_{\text {ad}}}\) is guaranteed by the projection theorem (see, e.g., Aubin (2000)) and for all \(w \in \mathbb {R}^{d_{u}}\) defined by the following:
In this contribution, we use the following abbreviations: F_{n} := F(u_{n}) and ∥⋅∥ denote the euclidean norm in the respective dimensions. The distinctive feature of the algorithm lies in the calculation of the search direction \(\hat G_{n}\). In each iteration n, the gradient ∇_{u}f(u_{n},⋅) is evaluated at a random position ω_{n} ∈ V_{ad} and stored for later iterations. The search direction \(\hat G_{n}\) is in principle a linear combination of the former gradients \(g_{i}:=\nabla _{u} f(u_{i},\omega _{i}) , i= 0,\dots ,n\) with weights \({\alpha _{i}^{n}}, i=0,\dots ,n\). To provide an idea how the weights are calculated, we refer to the sketch in Fig. 2. There, \(\omega _{0},\dots ,\omega _{10} \in V_{\text {ad}}\) are randomly sampled points and \(g_{0},\dots ,g_{10}\) the corresponding gradients. Then, the approximate gradient is given as \(\hat G_{10} = {\sum }_{k=0}^{10} a_{k}g_{k}\), where \(a^{10}_{0},\ldots ,a^{10}_{10}\) are the lengths of the line segments associated with the points (ω_{0},u_{0}),…,(ω_{10},u_{10}). Here, the assignment of segments to points is indicated by the same color. The underlying structure is the Voronoi diagram of the points (ω_{k},u_{k})_{k∈{1,…,10}} (see, e.g., Fortune (1995)). More formally, the weights \(\alpha ^{10}_{k}\) can be defined for all k ∈{0,…,10} as the d_{v}dimensional measure of Ω_{k} ∩ (u_{10} × V_{ad}) where Ω_{k} ⊂ V_{ad} × U_{ad} is the Voronoi face associated with the point (ω_{k},u_{k}).
We note that the computational complexity per iteration is given by the evaluation of the gradient of the function f and the calculation of the weights \({{\alpha }_{0}^{n}},\dots ,{{\alpha }_{n}^{n}}\). Up to the calculation of the weights, this is analogous to the SG and SAG method. It should also be noted that all the gradients \(g_{0},\dots ,g_{n1}\) of the previous iterations are included in the current iteration. In Section 3, we will show that the error \(\\hat G_{n} \nabla F_{n} \\) almost sure converges to zero. Hence, the algorithm does not become trapped in a local minimum of the discretized function. Therefore, the problem described in Fig. 1 will not arise, since the approximation of ∇F_{n} by \(\hat G_{n}\) will be better and better.
Convergence analysis
In this section, we will study the convergence of the proposed algorithm. Due to the randomly chosen evaluation point within the algorithm, we will have to study probabilistic convergence behavior in terms of “almost sure convergence” and convergence in expectation. This notion of convergence as well as further assumptions and definitions are given in Section 3.1. In Section 3.2, we prove that the error in the gradient approximation goes to zero and finally apply this result in Section 3.3 to prove convergence of the CSG method.
Assumptions, definitions, and preliminary results
For the convergence analysis of Algorithm 1, the following three assumptions on the objective functional, the step length, and the sets U_{ad},V_{ad} are an important ingredient. In the following, we will assume that these Assumptions are always satisfied without mentioning it explicitly.
Definition 4 (Lipschitz constants and maxima)
We will denote the Lipschitz constants of D_{J} and ∇_{u}f with respect to both their arguments by \({L}_{^{D_{ J}}}^{_{(1)}},{L}_{^{D_{ J}}}^{_{ (2)}}\) and by \({L}_{^{\nabla _{ u} f}}^{_{ (1)}},{L}_{^{\nabla _{ u} f}}^{_{ (2)}}\). Their maximal absolute function values will be defined as \(C_{^{D_{ J}}},C_{^{\nabla _{ u} f}}\).
For U_{ad} and V_{ad}, we assume the following:
Assumption 5 (Regularity ofU_{ad}andV_{ad})
The set \({U_{\text {ad}}} \subset \mathbb {R}^{d_{u}}\) is compact and convex. \(V_{\text {ad}} \subset \mathbb {R}^{d_{v}}\) is open and bounded. In addition, there exists \(c\in \mathbb {R}\) s.t. \(\left V_{\text {ad}} \setminus {V}_{\text {ad}}^{r} \right  \leq \text {cr} \ \forall r\in (0,1)\), with \({V}_{\text {ad}}^{r} := \{x\in V_{\text {ad}} : B_{r}(x) \subset V_{ad}\}\) and where \(B_{r}(x) \subset \mathbb {R}^{d_{v}} \) is an open ball centered in \(x\in \mathbb {R}^{d_{v}}\) with radius r.
The latter assumption is fulfilled for non pathological open sets with finite perimeter.
Assumption 6 (Step length)
The step length \((\tau _{n})_{n \in \mathbb {N}}\) satisfies the following: \(\exists N \in \mathbb {N}\), \(c_{1},c_{2}\in \mathbb {R}_{>0}\), and \(\delta \in \left (0,\frac {1}{\max \limits \{d_{v},2\}}\right )\) s.t.
These conditions on the step length satisfy the conditions stated in Robbins and Monro (1951, Eqns. (6) and (26)) as well as equivalently in Bottou et al. (2018, Eqn. (4.19) in the onedimensional case, and can be seen as a higher dimensional equivalent.
Remark 7 (Step length for d_{v} = 1 and d_{v} = 2)
In case of a one or twodimensional set V_{ad}, Assumption 6 is satisfied iff
with the Big Oh and Little Oh notation as defined in Bürgisser and Cucker (2013). In other words, the null series \((\tau _{n})_{n\in \mathbb {N}}\) must not tend faster to zero than \((n^{1})_{n\in \mathbb {N}}\) but not as slow as \((n^{\frac {1}{2}})_{n\in \mathbb {N}}\).
The lower bound for the stepsizes ensure that the accumulated stepsizes reach infinity and the algorithm does not get stuck due to their reduction. The upper bound ensures that the approximation of the search direction is appropriate. This is equivalent to the rate of convergence for empirical measures, see, e.g., Dudley (1969, Prop. 3.4.).
Despite these assumptions in the stepsize, in Theorem 20, a result will be stated for the case of a step length bounded away from zero.
To show convergence of the algorithm, we must first state the probability space setting.
Definition 8 (Probability space setup)
The probability space \(({\varOmega },\mathcal {A},\mathbb {P})\) is given by the following:
where \(\mu ^{\otimes \mathbb {N}} (A_{1}\times {\ldots } \times A_{n}) = {\prod }_{i = 1}^{n} \frac {\mu (A_{i})}{\mu (V_{\text {ad}})} \) is the product measure and \(\mu = \lambda ^{d_{v}}\) the Lebesgue measure in \(\mathbb {R}^{d_{v}}\).
All the following random variables are defined in this setting. For the convergence of random variables, we use the following commonly used notation:
Definition 9 (Stochastic convergence)
A sequence of random variables \((Z_{n})_{n\in \mathbb {N}}\) converges almost surely to some random variable Z iff
which we denote by Z_{n}→a.s.Z.
In this document, we define the following norm on the product space U_{ad} × V_{ad}.
Definition 10 (Norm inU_{ad} × V_{ad})
For better readability, we define the following ℓ^{1}/ℓ^{2}norm on the product space U_{ad} × V_{ad}:
Due to norm equivalence in finite dimensional spaces, the results presented later hold for all chosen norms in U_{ad} and V_{ad} and combinations thereof.
The orthogonal projection used in Algorithm 1 has some important properties:
Lemma 11 (Orthogonal projection)
Let \(S\subset \mathbb {R}^{n}\) for \(n\in \mathbb {N}_{>0}\) be closed and convex and let \(\mathcal {P}_{S}\) be the orthogonal projection, then the following holds for all \(x,y \in \mathbb {R}^{n}\) and z ∈ S:

a)
\((\mathcal {P}_{S}(x)  x)^{T} (\mathcal {P}_{S}(x)  z) \leq 0\),

b)
\( (\mathcal {P}_{S}(y)  \mathcal {P}_{S}(x))^{T}(yx) \geq \\mathcal {P}_{S}(y)  \mathcal {P}_{S}(x)\^{2} \geq 0\),

c)
\(\\mathcal {P}_{S}(y)  \mathcal {P}_{S}(x)) \ \leq \yx\\).
Proof
(a) is (ii) in Aubin (2000, Thm. 1.4.1) and (b), and (c) are (iii) and (ii) in Aubin (2000, Prop. 1.4.1). □
For h ∈ C^{1}(U_{ad}) and U_{ad} convex, the following sufficient conditions for firstorder optimality are equivalent:
Corollary 12 (Optimality conditions)
For all u^{∗}∈ U_{ad}, the following items are equivalent:

i)
−∇h(u^{∗})^{T}(u − u^{∗}) ≤ 0 ∀u ∈ U_{ad},

ii)
\(\mathcal {P}(u^{*} t\nabla h(u^{*})) = u^{*} \quad \forall t \geq 0\).
We call u^{∗} satisfying one of the above conditions a stationary point.
Proof
Define for u ∈ U_{ad} the cone
Using Lemma 11 ((i) for “⇒” and (ii) for “⇐”), it is straightforward to see that for \(x \in \mathbb {R}^{d_{u}}\) and u ∈ U_{ad},
Since \(\nabla h \in N_{{U_{\text {ad}}}}(u^{*})\), the result follows. □
Error in the approximate gradient
In this subsection, we analyze the error in the n th iteration of the approximate gradient \(\hat G_{n}\) and the gradient of the objective functional ∇F_{n}. To do this, we define for v ∈ V_{ad},ω ∈Ω the sequence of random variables \(\left (X_{n}\right )_{n\in \mathbb {N}}\) by the following:
Lemma 13 (Convergence result)
For v ∈ V_{ad},
where \(\varepsilon _{n} := 2C_{^{\nabla _{u} f}} c_{2} V_{ad} n^{\frac {\delta }{2}} + \tilde {\varepsilon }_{n}\) and \(\tilde {\varepsilon }_{n} := n^{\frac {\delta }{2}\frac {1}{\max \limits (2,d_{v})}}\) with c_{2} and δ defined in Assumption 6 and \(C_{^{\nabla _{u} f}}\) in Definition 4. Moreover,
with \(V_{ad}^{\varepsilon _{n}}\) defined in Assumption 5.
Proof
By item (iii) in Lemma 11 we have
where i_{0} := ⌈n − a_{n} + 1⌉ and κ ∈ (0,1) given by \(\kappa := 1\frac {1}{\max \limits \{d_v,2\}}+\delta \). For d_{v} = 1, we choose \(a_{n} = \sqrt {n}\) and if d_{v} ≥ 2, we choose \(a_{n} = n^{1\frac {\delta }{2}}\). Observe that for n > 2 we obtain
As \(\frac {a_{n}}{n} = n^{\frac {\delta }{2}\frac {1}{\max \limits \{d_{v},2\}}} \leq n^{\frac {1}{2\max \limits \{d_{n},2\}}}\) we obtain
For all v ∈ V_{ad} there exists n large enough such that \(B_{\tilde {\varepsilon }_{n}}(v) \subset V_{ad}\). Hence,
with \(\nu := \frac {\pi ^{\frac {d_{v}}{2}}}{\Gamma (\frac {d_{v}}{2}+1)V_{ad}}\). Thus
By the limit comparison test, the corresponding series converge. Finally, note that Assumption 5 gives that \(v \in V_{ad}^{\varepsilon _{n}}\) implies \(B_{\varepsilon _{n}}(v) \subset V_{ad}\) and therefore \(\displaystyle \sup \limits _{v \in V_{ad}^{\varepsilon _{n}}} \mathbb {P}(X_{n}(\cdot ;v) \geq \varepsilon _{n}) \le \left (1  \nu \cdot \frac {n^{\frac {\delta }{2}}}{a_{n}}\right )^{ a_{n}}\).
□
As a direct consequence of the latter result, we obtain almost sure convergence:
Corollary 14 (Density ofωinV_{ad})
For all v ∈ V_{ad}
Proof
The result follows by Lemma 1 and the BorelCantelli Lemma, see, e.g., Klenke (2013, Thm. 6.12). □
Thus, due to the Lipschitz continuity of ∇_{u}f, and D_{J} the integral in ∇F(u_{n}) is increasingly is increasingly better approximated by \(\hat G_{n}\) for \(n \rightarrow \infty \):
Corollary 15 (Error in gradient approximation)
The norm of the difference between approximate gradient \(\hat G_{n}\) in the n th iteration (defined in Algorithm 1) and the gradient of the exact objective functional ∇F in u_{n} goes to zero, i.e.,
and
Proof
For v ∈ V_{ad}, define
Then,
where \(\hat f(u_{n},v) := f(u_{k^{n}(v)},\omega _{k^{n}(v)})\). By the Lipschitz continuity assumed in Assumption 1, we therefore obtain the following:
with constants defined in Definition 4 and X_{n}(⋅;v) as defined in Lemma 1. Recall that U_{ad}, V_{ad} are bounded. Now, the almost sure convergence, as well as the convergence of the expectations, is followed by Lebesgue’s dominated convergence result.
Finally, let C be a generic constant and ε > 0. Since \(\sup _{v \in V_{\text {ad}}} X_{n}(\cdot ;v) \le D < \infty \), where D := diam(V_{ad}) + diam(U_{ad}) denotes the diameter of V_{ad} plus the diameter of U_{ad}, and by Fubini’s theorem, we have the following:
where \({V}_{\text {ad}}^{r}\) is given in Assumption 5. If we choose \(\varepsilon = \varepsilon _{n} = 2 C_{^{\nabla _{ u} f}} c_{2}\cdot n^{\frac {1}{2}} + n^{\frac {1}{\max \limits (2,d_{v})}+\frac {\delta }{2}}\) as in Lemma 1, we obtain the following:
which concludes the proof. □
Convergence result
As we have seen in Corollary 14, the error \(\\hat {G}_{n}  \nabla F_{n}\ \) converges almost surely and in expectation to zero for \(n\rightarrow \infty \). It remains to provide sufficient conditions under which the algorithm converges to a stationary point.
Lemma 16 (Objective functional values)
The difference of the objective functional values in iteration \(n\in \mathbb {N}\) can be approximated as follows:
with \(\phi _{n} := \tau _{n} \\nabla F_{n}\hat {G}_{n} \ \cdot \\hat {G}_{n}\ + {\tau _{n}^{2}} C\\hat {G}_{n}\^{2}.\)with a constant \(C \in \mathbb {R}_{\geq 0}\) depending on the lipschitz constants and suprema of the involved functions.
Proof
By the mean value theorem, there is a c ∈ (0,1) such that (we set \(\nabla {{F}_{n}^{c}} := \nabla F((1c) u_{n} + c u_{n+1}\))
using the Cauchy–Schwartz inequality and Definition 4. Recall that \(u_{n+1} = P_{U_{\text {ad}}}(u_{n}  \tau _{n} \hat {G_{n}})\). With Lemma 11 (b), (c), and the Cauchy–Schwartz inequality for the first term of the righthand side of the latter equation, we obtain the following:
Applying Lemma 11 (c) to the second term yields \( \\mathcal {P}_{U_{\text {ad}}}(u_{n}  \tau _{n} \hat {G_{n}})  u_{n}\^{2}\leq {\tau _{n}^{2}} \\hat G_{n}\^{2}. \)□
Since the first term in righthand side of the estimate in the above Lemma is strictly negative while the second term is strictly positive, we may expect a descent, provided \(\mathbb {E}\left [\phi _{n}\right ]\) is small enough.
Corollary 17 (Convergence result)
We have the following:
where \(\phi _{n} := \tau _{n} \\nabla F_{n}\hat {G}_{n} \ \\hat {G}_{n}\ + {\tau _{n}^{2}}\frac {C_{\nabla _{u} f}}{2}\\hat {G}_{n}\^{2}.\)
Proof
Since \(\\hat G_{n}\\) is bounded and \(\sum \limits _{n = 1}^{\infty } {\tau _{n}^{2}} < \infty \) by Assumption 6, the result follows by (3.2). □
Before we present our main results, we need the following auxiliary result:
Lemma 18 (Projection of gradient steps)
Proof
Define \(x:=u_{n},y:=\hat G_{n}\), and τ := τ_{n}. We assume that x − τy∉U_{ad} (otherwise the result follows by Lemma 11) and set \(n_{\tau }:= x\tau y  \mathcal {P}_{U_{\text {ad}}}(x\tau y)\) and
Since U_{ad} is convex, we have U_{ad} ⊂ H, and therefore \(\forall u\in \mathbb {R}^{d_{u}}\) by Lemma 11,
where \(\mathcal {P}_{H}\) is the orthogonal projection onto H (compare Fig. 3). With \(B:= \frac {t}{\tau }\left (\mathcal {P}_{U_{\text {ad}}}(x\tau y)x\right ) + x\) and (3.3), we have the following:
□
Recalling the characterization of stationary points from Corollary 12, we obtain our first main result:
Theorem 19 (Convergence result)
For all t ≥ 0,
Proof
First, note that by the compactness of U_{ad} and regularity of F, \(F_{\inf } := \inf _{u \in {U_{\text {ad}}}} F(u) > \infty \). Summing both sides of the inequality in Lemma 16 up to an \(N\in \mathbb {N}\) gives the following:
Hence, by Corollary 17,
Using Lemma 11 (ii) together with Lemma 18 gives for all \(n \in \mathbb {N}\) sufficiently large,
Since \(\u_{n+1}  u_{n}\ \le \tau _{n} \ \hat G_{n}\\), this combined with Corollary 15 gives the result. □
As a direct consequence, we have the following:
Theorem 20 (Main theorem)
Let \((u_{n})_{n\in \mathbb {N}}\) be generated by Algorithm 1. Then, there exists a subsequence \((u_{n_{k}})_{k\in \mathbb {N}}\) converging to a stationary point, i.e.,
Proof
Direct consequence of Theorem 19. □
For applications, the condition on the step length (Assumption 6) is inconvenient, since the step length becomes very small and the algorithm thus progresses only slowly. If the algorithm is performed with a constant stepsize, and if the sequence \((u_{n})_{n\in \mathbb {N}}\) converges to some u^{∗}∈ U_{ad}, then the following theorem demonstrates that u^{∗} is a stationary point.
Theorem 21 (Convergent series)
Assume the timestep series \((\tau _{n})_{n\in \mathbb {N}}\) satisfies \(\tau _{n} \geq \tau \quad \forall n \in \mathbb {N} \) for some τ > 0. Let further \((v_{n})_{n\in \mathbb {N}}\) be dense in V_{ad} and assume \((u_{n})_{n\in \mathbb {N}}\) converges to u^{∗}∈ U_{ad}. Then, u^{∗} is a stationary point of F, i.e.,
Proof
Similar to Corollary 15, \(\\hat G_{n}  \nabla F_{n}\ \rightarrow 0\). Thus, by convergence of \((u_{n})_{n\in \mathbb {N}}\), we obtain the following:
by Lemma 18 and Lemma 11 (c). □
Note that almost all sequences \((v_{n})_{n\in \mathbb {N}}\) that are given by the random number generator in Algorithm 1 are dense in V_{ad}. In addition to the convergence properties shown in the latter theorems, the algorithm also approximates the objective functional value with arbitrary accuracy:
Corollary 22 (Approximation ofF)
Let the series \((u_{n})_{n\in \mathbb {N}}\) be generated by Algorithm 1. Then, for all convergent subsequences \((u_{n_{k}})_{k\in \mathbb {N}}\) with \(u_{n_{k}} \rightarrow u^{*}\), we obtain for \(\hat F\) as defined in Alg. 1: \( \hat F_{n_{k}}  F(u^{*}) \stackrel {\text {a.s.}}{\rightarrow } 0 \), assuming further \((v_{n_{k}})_{k\in \mathbb {N}}\) is dense in V_{ad}, we obtain \( \lim _{k\rightarrow \infty } \hat F_{n_{k}} = F(u^{*}). \)
Proof
The proof is similar to the proof of Lemma 1 relying on the Lipschitz continuity of F—as a direct consequence of Assumption 1—and Corollary 13. □
Remark 23 (Termination condition)
By the regularity assumption on the objective functional in Assumption 1, the termination condition in Algorithm 1 can be posed as follows:
for 𝜖 > 0. This is obviously not possible for SG. To satisfy such a condition by the SAG method, the discretization of the objective functional has to be sufficiently fine in order to approximate the gradient with sufficiently accurate. However, depending on the particular example, an a priori discretization satisfying this property can be hard to determine.
Numerical results
In this section, we will compare the following stochastic optimization methods mentioned in the introductory section:

CSG (continuous stochastic gradient method): as introduced in Section 2, this scheme relies on the computation of a single gradient in each iteration and the interpolation with previously computed information.

SG (stochastic gradient method): the classical stochastic optimization scheme as outlined in Bottou et al. (2018). The convergence of the method is based on on decreasing stepsizes.

SAG (stochastic average gradient method): an improved stochastic gradient scheme as introduced in Schmidt et al. (2017) restricted to the case of a finite sum as objective rather than an integral. The true advantage of possible larger stepsizes can be seen in the examples and is also valid for CSG. We will write SAG_{n} (\(n\in \mathbb {N}\)) for an SAG method relying on an nstep quadrature rule to discretize the integral in the original objective.
The performance of these methods strongly depends on the chosen stepsizes. In the following examples, the stepsizes are chosen such that all the schemes have a good performance, in the range of their possibilities. However, adaptive stepsize control (also known as learning rate in the field of machine learning) is itself a subject of research for stochastic optimization schemes and is not the focus of this contribution. For example, in Kingma and Ba (2015), the stepsizes are derived from estimates of first and second moments of the gradients and in Tan et al. (2016), a BarzilaiBorweintype stepsize adaption is discussed.
To compare the methods, we have chosen the following way to display the results. For a large number of optimization runs, we compute the quantile curves α_{p}(n) defined as \(\mathbb {P}(u_{n} > \alpha _{p}(n)) = p\) for p ∈ (0,1). With this, we define the quantile sets P_{p,q} which lie in between the p and the qquantile, i.e.,
These areas will be colored in various degrees of opacity in order to show the behavior of the optimization procedure in its probabilistic nature.
First, we will compare the algorithms by optimizing the function defined in (2.1) and equivalently in (4.2), as well as an additional academic objective functional which will be defined in (4.3).
Academic examples
We will study the behavior of the algorithms in the following two cases:
as introduced in (2.1) (see Fig. 1) and
(see Fig. 4).
To be able to apply the SAG algorithm, we approximate the integral by the trapezoidal rule in both cases. For this, we use an equidistant grid, i.e., for \(N\in \mathbb {N}\), v_{0} := − 1, we define \(v_{i} = v_{0} + \text {kh} \quad k=1,\dots ,{N+1}\) with \(h:= \frac {{\textit {V}_{\text {ad}}}}{N}=\frac {2}{N}\). In this way, F is approximated as follows:
The optimization problem considered in the SAG case, thus reads as follows:
The approximation error directly depends on the second derivative of f w.r.t., the second argument and the gridspacing h, that is, the number of intervals. A finer grid thus leads to a better approximation, but also, for a deterministic gradient descent method, to a high number of problems to solve in each iteration.
The comparison of the algorithms is based on the number of function evaluations as these constitute the timeconsuming steps for complex optimization tasks. We compare two different settings, one with a stepsize which is proportional to n^{− 0.6} (see left column in Figs. 5 and 6) and one with an appropriately chosen constant stepsize (see right column in Figs. 5 and 6). The shaded areas in Figs. 5 and 6 denote the quantile sets as defined in (4.1) for the 10 and 90% quantile (light), 20 and 80% quantile and 30 and 70% quantile (medium dark), as well as 40 and 60% quantile (dark). The quantiles are based on 10^{5} optimization runs each with randomized initial datum. The black lines in the Figs. 5 and 6 identify the median for CSG and SG. In SAG_{4} and SAG_{8}, they identify the median of the series converging to one of the local minima. In addition to this, the line thickness and the patch opacity is proportional to the probability of converging to the respective local minimum.
Results of numerical experiments
In case of objective functional (4.2) optimized by SAG_{4} and SAG_{8}, the algorithm converges to one of the three or five local minima of the discretized objective functional, respectively (see Fig. 1 with N = 4 and N = 8). In contrast to SAG, SG (in the case of decreasing stepsizes, see Fig. 5, left) and CSG (in both considered cases, see Figs. 5 and 6) converge to the optimal value of F. The “failure” of SAG is due to the fact that SAG is only approximating the original objective functionals. On the other hand, it is clear that for a sufficiently regular function (see Assumptions 1), the optimal values of SAG_{N} converge to the optimal solution of the original problem for \(N \rightarrow \infty \). However, a sufficiently large number N is in general not known a priori. Thus, even for a large N, it is not clear how (local) optimal solutions u^{∗} as well as their objective functional values F(u^{∗}) are affected by the discretization of the integral. Moreover, the convergence of the SAG method becomes slower with larger N as can be clearly seen from the more diffuse quantile sets in Figs. 5 and 6.
Figures 5 and 6 clearly show the advantage of CSG as it converges considerably faster in comparison to SG and does not converge to an artificial local minimum as SAG. While SG also approaches the optimal value at least in the case with decreasing stepsize, the speed of convergence is considerably lower compared to CSG. The true potential of CSG comes into play whenever constant stepsizes τ_{n} are chosen. This can be seen in the right columns of Figs. 5 and 6. It should be noted that the stepsize could be adapted individually for each method, in order to approach a slightly better convergence behavior. In particular, the constant stepsize we have chosen uniformly for all methods seems to be too large for SG. On the other hand, CSG can easily handle this stepsize.
Finally, it is observed that CSG combines the advantages of SAG and SG. For instance, in the left column in Fig. 6, SAG_{4} converges quickly but unfortunately to the “wrong” result, while SG converges to the correct limit point, but convergence is very slow. In contrast, CSG converges quickly to the correct limit point.
Another advantage of CSG is that according to Theorem 20, Corollary 21 and as discussed in Remark 22, the CSG algorithm can be stopped whenever the objective function is approximated with defined accuracy and the firstorder optimality conditions are satisfied within a given error tolerance. This is, in general, neither possible for SG nor for SAG.
Structural optimization example
As a design optimization test case, we have chosen the optimization of a 2D tire, fixed at the midpoint and loaded from an arbitrary direction. A weighted sum of the expected compliance, a volume penalization term and a regularization term form the objective functional.
In detail, we consider the design domain \({\varOmega } := \{ x\in {\mathbb {R}}^{2} \mid 0.1 < \x\ < 1 \}\)—an annulus—which is subject to a load described by the function
on the outer boundary Γ_{N} ⊂ ∂Ω, and fixed with a homogeneous Dirichlet condition on the inner boundary Γ_{D} ⊂ ∂Ω (see Fig. 7). Here, n(x) denotes the outer normal vector in x ∈ ∂Ω and \(h_{\alpha } : {\mathbb {R}} \to {\mathbb {R}}\) denotes for \(\alpha \in {\mathbb {R}}\) the scalar function:
The angle α describes the position where the boundary force takes its maximum value and the function can be seen as a smoothed Dirac force on the boundary (see Fig. 8).
In Ω, material properties are defined using a pseudo density function ρ which is used to scale a given isotropic material characterized by Lamé parameters λ and μ. Denoting this material by E, the resulting material function is given by the SIMP law ρ^{p}E, with a penalty parameter p > 1, see Bendsøe (1989). We assume that the material properties are fixed close to the outer boundary and thus set the pseudo density ρ is set to 1 in this part of the domain, i.e., \(\rho _{\hat {\varOmega }} \equiv 1\) with \(\hat {\varOmega } := \{ x\in {\mathbb {R}}^{2} \ 0.9 < \x\ < 1 \}\) (see Fig. 7, yellow marked area). In the rest of the domain, ρ serves as design variable and is allowed to vary between a small positive value ε and 1.
Now, for each admissible design ρ and each α ∈ [0,2π] a linear elasticity problem, the socalled state problem is defined on Ω applying boundary conditions as described above. The corresponding state solution is denoted by u(ρ,α).
The optimization goal is to minimize the expected compliance for angles α ∈ [0,2π]. In addition, we introduce a term to penalize the total material consumption and a filter regularization term as proposed in Semmler et al. (2018, Section 3.2). This leads to the objective functional as follows:
where
γ_{0},γ_{v},γ_{φ} > 0 are given scaling parameters, “ ∗ ” denotes a convolution operator in \({\mathbb {R}}^{2}\) and the filter kernel \(\varphi :{\mathbb {R}}^{2} \to {\mathbb {R}}\) is defined by the following:
with radius r_{0} > 0. By finite element discretization, this leads to the following optimization problem:
Here, M is the number of design variables, \(K(\rho _{h}) \in {\mathbb {R}}^{N\times N}\) denotes the global stiffness matrix with N degrees of freedom, and \( g_{h}(\alpha )\in {\mathbb {R}}^{N}\) the righthand side of the linear elastic state problem with the load centered in angle α in finite element notation. Moreover, j_{h} is the discretized equivalent of J. In the following example, the parameters are chosen as follows: γ_{0} = 1,γ_{v} = 0.1,γ_{φ} = 1,r_{0} = 0.05, and SIMP parameter p = 3 (see, e.g., Bendsøe (1989)). As material parameters, we have chosen λ = μ = 1 and, choosing ε = 10^{− 2}, the void stiffness was defined as 10^{− 6}.
Computational optimization results
As in the academic examples presented in Section 4.1, we show and compare results for SG, SAG, and CSG. All optimization runs have been started with \(\rho \equiv \frac {1}{2}\) in the design domain \({\varOmega } \setminus \hat {\varOmega }\). The linear elasticity problem is discretized using ≈ 40 ⋅ 10^{3} unstructured triangular elements generated by Triangle (see Shewchuk (1996)). Using firstorder Lagrange basis functions, this discretization results in approximately the same number of degrees of freedom in terms of u_{h}. The design domain comprises roughly 20 ⋅ 10^{3} degrees of freedom in terms of ρ_{h}. The implementation is performed in Matlab and the linear system is solved using the direct solver available through the backslash operator.
Analogous to the previous experiments, we have chosen a suitable initial stepsize and have discretized the integral in (4.8) for SAG using a trapezoidal rule. Again, the SG method is applied to its undiscretized version to omit dependencies on the choice and accuracy of the quadrature rule. For validation and comparison purposes, the objective function is further approximated with the trapezoidal rule using a total of 180 equidistant discretization points to ensure a good approximation for the expected compliance.
In Fig. 9, the distribution of obtained objective functional values for 480 optimization runs with 2048 steps each is compared for the different methods with appropriately chosen stepsize rules. The presented results clearly show the fast convergence of CSG in comparison to the other schemes.
Moreover, in Fig. 10, a rapid convergence of the increasingly better approximated objective function values is observed when applying the CSG method to the structural optimization problem. It is noted that, by construction, this type of convergence can neither be expected for the SG nor for the SAG method.
In Fig. 11, the socalled physical density ρ^{p} is shown for SAG, SG, and CSG for constant stepsize. While SG does not converge in 2048 iteration, SAG converges, though the resulting density distribution is strongly influenced by the discretization of the objective functional—note the 8 struts corresponding to the 8 discrete load cases applied (see the middle column of Fig. 11). No such effect is observed in the case of the CSG result (right column of Fig. 11). Moreover, the CSG result appears to be much clearer compared to the SG result, which is due to faster convergence.
It is finally noted that, in the interest of comparability, we have not applied any continuation scheme (e.g., SIMP parameter \(p\rightarrow \infty \)), which would result in a true “black and white” solution, i.e., \(\rho (x) \in \{0,1\} \ \forall x\in {\varOmega } \setminus \tilde {\varOmega }\). This is of course necessary to achieve a physical interpretable and manufacturable solution. We leave the question of defining a continuation scheme suitable for CSG as a subject for further research.
Conclusion and outlook
In this work, we introduced the continuous stochastic gradient method, which is applicable for the solution of a broad class of structural optimization problems. Preliminary experiments with notorious academic examples as well as an application from mechanics, in which an elastic structure has been optimized with respect to infinitely many load cases, revealed that the CSG method performed better than both, the traditional SG method and its relative, SAG, in the sense that a significantly lower function value could be obtained in a defined number of iterations. This is particularly interesting as the CSG algorithm requires roughly the same computational effort per optimization iteration as the SG and the SAG method. Moreover, like the SAG method, it benefits from gradient information obtained in previous iterations. Importantly, the CSG method does not require an a priori discretization of integrals entering the objective function, and the function value and gradient can be approximated with arbitrary precision throughout the optimization iterations. The latter results in the CSG method approaching a full gradient method throughout the course of the iterations.
While the CSG method appears promising, to obtain a full picture of its behavior when applied to practical applications, more examples, e.g., from robust optimization, acoustics, or optics should be tested in future.
Furthermore, from a theoretical point of view, an analysis covering the convergence rate for convex functions would be helpful to provide a deeper understanding of the algorithm. It should also be noted that the computational effort of computing the gradient weights \({{\alpha }_{0}^{n}},\dots ,{{\alpha }_{n}^{n}}\) grows with each iteration. As compensation, an efficient implementation of the algorithm is crucial. While this can easily be done in the case of a onedimensional index set V_{ad}, the question remains how this can be achieved in higher dimensions. Other interesting questions center around how Lipschitz constants can be estimated throughout the optimization process and how the stepsize can be automatically adapted.
Finally, similarly as suggested in De et al. (2019), a combination with established structural optimization algorithms as, for instance, GCMAA is conceivable.
References
Alvarez F, Carrasco M (2005) Minimization of the expected compliance as an alternative approach to multiload truss optimization. Struc Multidiscip Optim 29(6):470–476
Aubin JP (2000) Applied functional analysis, 2nd edn. Pure and Applied Mathematics New York, WileyInterscience, New York, with exercises by Bernard Cornet and JeanMichel Lasry, Translated from the French by Carole Labrousse
Bendsøe MP (1989) Optimal shape design as a material distribution problem. Struc Optim 1(4):193–202
Bottou L, Cun YL (2004) Large scale online learning. In: Advances in neural information processing systems, pp 217–224
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for largescale machine learning. Siam Rev 60(2):223–311
Bürgisser P, Cucker F (2013) Condition: the geometry of numerical algorithms, vol 349. Springer Science & Business Media, Berlin
Conti S, Held H, Pach M, Rumpf M, Schultz R (2009) Shape optimization under uncertainty—a stochastic programming perspective. SIAM J Optim 19(4):1610–1632
De S, Hampton J, Maute K, Doostan A (2019) Topology optimization under uncertainty using a stochastic gradientbased approach. arXiv:190204562
Dilgen CB, Dilgen SB, Aage N, Jensen JS (2019) Topology optimization of acoustic mechanical interaction problems: a comparative review. Struct Multidiscip Optim 60(2):779–801
Dudley RM (1969) The speed of mean GlivenkoCantelli convergence. Annals Math Stat 40(1):40–50
Fortune S (1995) Voronoi diagrams and delaunay triangulations. In: Computing in euclidean geometry. World Scientific, Singapore, pp 225–265
Haber E, Chung M, Herrmann F (2012) An effective method for parameter estimation with PDE constraints with multiple righthand sides. SIAM J Optim 22(3):739–757
Jahn J (2007) Introduction to the theory of nonlinear optimization. Springer Science & Business Media, New York
Jensen J, Sigmund O (2011) Topology optimization for nanophotonics. Laser Photonics Rev 5(2):308–321
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. CoRR abs/1412.6980
Klenke A (2013) Probability theory: a comprehensive course. Springer Science & Business Media, New York
Lazarov BS, Schevenels M, Sigmund O (2012) Topology optimization considering material and geometric uncertainties using stochastic collocation methods. Struct Multidiscip Optim 46(4):597–612
Maute K, Frangopol DM (2003) Reliabilitybased design of MEMS mechanisms by topology optimization. Comput Struc 81(8):813–824. K.J Bathe 60th Anniversary Issue
Robbins H, Monro S (1951) A stochastic approximation method. The Annals of Mathematical Statistics 22(3):400–407
RoostaKhorasani F, van den Doel K, Ascher U (2014) Stochastic algorithms for inverse problems involving PDEs and many measurements. SIAM J Sci Comput 36(5):S3–S22
Schmidt M, Le Roux N, Bach F (2017) Minimizing finite sums with the stochastic average gradient. Math Program 162(1):83–112
Semmler J, Pflug L, Stingl M, Leugering G (2015) Shape optimization in electromagnetic applications. In: New trends in shape optimization. Springer International Publishing, pp 251–269
Semmler J, Pflug L, Stingl M (2018) Material optimization in transverse electromagnetic scattering applications. SIAM J Sci Comput 40(1):B85–B109
Shewchuk JR (1996) Triangle: engineering a 2D quality mesh generator and Delaunay triangulator. In: Workshop on applied computational geometry. Springer, pp 203–222
Svanberg K (2002) A class of globally convergent optimization methods based on conservative convex separable approximations. SIAM J Optim 12(2):555–573
Tan C, Ma S, Dai YH, Qian Y (2016) BarzilaiBorwein step size for stochastic gradient descent. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems, vol 29. Curran Associates Inc., pp 685–693
Zhang XS, de Sturler E, Paulino GH (2017) Stochastic sampling for deterministic structural topology optimization with many load cases: densitybased and ground structure approaches. Comput Methods Appl Mech Eng 325:463–487
Acknowledgments
Open Access funding provided by Projekt DEAL. In additon, we thank J. Rodestock for his proofreading of this paper.
Funding
This work has been supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)  ProjectID 416229255  SFB 1411 and by DFG through the Cluster of Excellence “Engineering of Advanced Materials” at FAU ErlangenNürnberg.
Author information
Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Replication of results
All parameters and data required for the application of the CSG, SG, and SAG algorithms are given in the numerical section of this article. Moreover, the CSG algorithm is outlined in full length in Section 2. Finally, standard implementations of the SG and the SAG algorithm have been used.
Responsible Editor: Mehmet Polat Saka
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Pflug, L., Bernhardt, N., Grieshammer, M. et al. CSG: A new stochastic gradient method for the efficient solution of structural optimization problems with infinitely many states. Struct Multidisc Optim 61, 2595–2611 (2020). https://doi.org/10.1007/s0015802002571x
Received:
Revised:
Accepted:
Published:
Issue Date:
Keywords
 Stochastic gradient method
 Infinitely many state problems
 Robust structural optimization
 Proof of convergence