Abstract
In this paper, we present a stochastic augmented Lagrangian approach on (possibly infinitedimensional) Riemannian manifolds to solve stochastic optimization problems with a finite number of deterministic constraints. We investigate the convergence of the method, which is based on a stochastic approximation approach with random stopping combined with an iterative procedure for updating Lagrange multipliers. The algorithm is applied to a multishape optimization problem with geometric constraints and demonstrated numerically.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In this paper, we concentrate on stochastic optimization problems of the form
Here, \(\mathcal {U}\) is a Riemannian manifold and \(\varvec{\xi } :\varOmega \rightarrow \varXi \subset {\mathbb {R}}^m\) is a random vector defined on a given probability space. We assume that we have deterministic constraints of the form \(\varvec{h}:\mathcal {U} \rightarrow {\mathbb {R}}^n\), \(u \mapsto \varvec{h}(u) = (h_1(u), \dots , h_n(u))^\top \), where we distinguish between the index set \(\mathcal {E}\) of equality constraints and the index set \(\mathcal {I}\) of inequality constraints.
Our investigations are motivated by applications in shape optimization, where an objective functional is supposed to be minimized with respect to a shape, or a subset of \({\mathbb {R}}^{d}\). Finding a correct model to describe the set of shapes is one of the main challenges in shape optimization. From a theoretical and computational point of view, it is attractive to optimize in Riemannian manifolds because algorithmic ideas from [1] can be combined with approaches from differential geometry as outlined in [15]. This is one of the main reasons why we focus on Riemannian manifolds in this paper. One needs to take into account that these Riemannian manifolds could be also infinite dimensional, e.g., the space of plane curves [37,38,39, 51], the space of piecewisesmooth curves [40], and the space of surfaces in higher dimensions [3, 4, 26, 30, 36]. Often, more than one shape needs to be considered, which leads to socalled multishape optimization problems. As applications, we can mention electrical impedance tomography, where the material distribution of electrical properties such as electric conductivity and permittivity inside the body is examined [11, 31, 33], and the optimization of biological cell composites in the human skin [45, 46].
If one focuses on onedimensional shapes, the abovementioned space of plane unparametrized curves is a prominent example of an infinitedimensional manifold. In our numerical application (cf. Sect. 3), we also focus on this shape space. Our choice of this space comes from the fact that in shape optimization, the set of permissible shapes generally does not allow a vector space structure. One should note that there is no obvious distance measure without a vector space structure, which is a central difficulty in the formulation of efficient optimization methods. If one cannot work in vector spaces, Riemannian shape manifolds are the next best option, but they come with additional difficulties; see Sect. 2.3.
A central difficulty in (P) is that the constraints lead to a stochastic optimization problem that cannot be handled using standard techniques such as gradient descent or Newton’s method; additionally, the numerical solution of the problem may be intractable on account of the expectation. In this work, we propose a stochastic augmented Lagrangian method to solve problems of the form (P). The proposed method combines the smoothing properties of the augmented Lagrangian method with a reduction in complexity granted by stochastic approximation.
The augmented Lagrangian method has been extensively studied; see [7, 8] for an introduction to the method when \(\mathcal {U} = {\mathbb {R}}^n\). Substantial theory can be found in the literature for PDEconstrained optimization, which is related to our setting in PDEconstrained shape optimization and where convergence has been studied in function spaces; see [21,22,23,24, 47]. This theory does not apply even for deterministic counterparts of (P) since our control variable u belongs to a Riemannian manifold, not a Banach space. The study of constrained optimization on Riemannian manifolds is still nascent. There are relatively recent advances in firstorder optimality conditions in KKT form, including the development of constraint qualifications analogous to the finitedimensional setting [6, 50]. The augmented Lagrangian method has recently been developed for Riemannian manifolds [27, 35, 49]. These methods have been developed for deterministic problems, however, and therefore cannot be applied to problems of the form (P).
Stochastic approximation is a class of algorithms that originated from the paper [41] and has developed in recent decades due to its applicability to highdimensional stochastic optimization problems. Thanks in part to applications in machine learning, these algorithms are increasingly being developed in the setting of Riemannian optimization; see, e.g., [10, 25, 42, 52, 53]. The most basic algorithm is the stochastic gradient method, which can be used to solve an unconstrained version of (P), i.e., the problem of minimizing the expectation. Recently, the stochastic gradient method was proposed to handle PDEconstrained shape optimization problems [14, 15]. In [14], asymptotic convergence was proven for optimization variables belonging to a Riemannian manifold and the connection was made to shape optimization following the ideas in [48]. However, the stochastic gradient method cannot solve problems of the form (P).
While both augmented Lagrangian and stochastic approximation methods are welldeveloped, the combined method—what we call the stochastic augmented Lagrangian method—is not. In the context of training neural networks, a combined stochastic gradient/augmented Lagrangian approach in the same spirit as ours can be found in the paper [13]. Our method, however, involves a novel use of the randomized multibatch stochastic gradient method from [18, 19], where a random number of stochastic gradient steps are chosen. We use this strategy to solve the inner loop optimization problem for fixed Lagrange multipliers and penalty parameters. A central consequence of the random stopping rule from [18, 19] is that convergence rates of the expected value of the norm of the gradient can be obtained, even in the nonconvex case. The random stopping rule in combination with an outer loop procedure can be used to adaptively adjust step sizes and batch sizes for a tractable algorithm where asymptotic convergence to stationary points of the original problem is guaranteed.
The paper is structured as follows. In Sect. 2, we present the stochastic augmented Lagrangian method for optimization on Riemannian manifolds and analyze its convergence. Then, an application for our method is introduced and results of numerical tests are presented in Sect. 3. To conclude, we summarize our results in Sect. 4.
2 Optimization Approach
In this section, we introduce the stochastic augmented Lagrangian method for Riemannian manifolds. In view of our later application to shape optimization, where convexity of the objective functional j cannot be expected, we focus on providing results for the nonconvex case. First, in Sect. 2.1, we will provide background material that will be of use in our analysis. In particular, definitions and theorems from differential topology and geometry that are required in this paper will be provided. For background details, we refer to, e.g., [28, 29, 32, 34] for differential geometry and [20] for probability theory. The algorithm is presented in Sect. 2.2. Convergence of the method is proven in two parts: in Sect. 2.3, we provide an efficiency estimate for the inner loop procedure, corresponding to a randomized multibatch stochastic gradient method. Then, in Sect. 2.4, convergence rates with respect to the outer loop procedure, which corresponds to a stochastic augmented Lagrangian method, are given.
2.1 Background and Notation
We consider the Euclidean norm \(\left\ \cdot \right\ _2\) on \({\mathbb {R}}^n\) throughout the paper. For a differentiable Riemannian manifold \((\mathcal {U},\mathcal {G})\), \(\mathcal {G}=(\mathcal {G}_u)_{u\in \mathcal {U}}\) denotes the Riemannian metric. The induced norm is denoted by \(\left\ \cdot \right\ _{\mathcal {G}} {:}{=}\sqrt{\mathcal {G}(\cdot ,\cdot )}\). Here and throughout the manuscript, we frequently omit the subscript u from the metric when the context is clear. The tangent of space of \(\mathcal {U}\) at a point \(u \in \mathcal {U}\) is defined in its geometric version as
where the equivalence relation for two differentiable curves \(c,\tilde{c}:{\mathbb {R}}\rightarrow \mathcal {U}\) with \(c(0) = \tilde{c}(0) =u\) is defined as follows: \(c \sim \tilde{c} \Leftrightarrow \tfrac{\,\textrm{d}}{\,\textrm{d}t}\phi _{\alpha }(c(t))\vert _{t=0} =\tfrac{\,\textrm{d}}{\,\textrm{d}t} \phi _{\alpha }(\tilde{c}(t))\vert _{t=0}\) for all \(\alpha \) with \(u \in U_\alpha \), where \(\{(U_\alpha , \phi _\alpha )\}_\alpha \text { is an atlas of }\mathcal {U}.\) The derivative of a smooth mapping \(f:\mathcal {U}\rightarrow \widetilde{\mathcal {U}}\) between two differentiable manifolds \(\mathcal {U}\) and \(\widetilde{\mathcal {U}}\) is defined using the pushforward. In a point \(u\in \mathcal {U}\), it is defined by \((f_*)_u :T_u \mathcal {U} \rightarrow T_{f(u)} \widetilde{\mathcal {U}}\) with \((f_*)_u(c){:}{=}\frac{\textrm{d}}{\textrm{d}t} f(c(t))\vert _{t=0} = (f \circ c)'(0),\) where \(c:I\subset {\mathbb {R}}\rightarrow \mathcal {U}\) is a differentiable curve with \(c(0)=u\) and \(c'(0)\in T_u \mathcal {U}\). In particular, \(f:\mathcal {U} \rightarrow \widetilde{\mathcal {U}}\) is called \(\mathcal {C}^k\) if \(\psi _\beta \circ f\circ \phi _\alpha ^{ 1}\) is ktimes continuously differentiable for all charts \((U_\alpha ,\phi _\alpha )\) of \(\mathcal {U}\) and \((V_\beta ,\psi _\beta )\) of \(\widetilde{\mathcal {U}}\) with \(f(U_\alpha )\subset V_\beta \). In the case \(\widetilde{\mathcal {U}}={\mathbb {R}}\), a Riemannian gradient \(\nabla f(u) \in T_u \mathcal {U}\) is defined by the relation
We define \(V_u\;{:}{=}\;\{v\in T_u\mathcal {U}:1\in I_{u,v}^\mathcal {U}\}\) with \( I_{u,v}^\mathcal {U}\;{:}{=}\; \bigcup \limits _{I\in \tilde{I}_{u,v}^\mathcal {U}} I \), where
Then, we denote the exponential mapping by
where \(\exp _u(v)\) is the exponential map of \(\mathcal {U}\) at u, which assigns to every tangent vector \(v\in V_u \) the point c(1) and \(c:I_{u,v}^\mathcal {U}\rightarrow \mathcal {U}\) is the unique geodesic satisfying \(c(0)=u\) and \(c'(0)=v\).
Let the length of a \(\mathcal {C}^1\)curve \(c:[0,1]\rightarrow \mathcal {U}\) be denoted by \(L (c) = \int _0^1 \left\ c'(t) \right\ _{\mathcal {G}}\,\textrm{d}t\). Then the distance \(\textrm{d}:\mathcal {U} \times \mathcal {U} \rightarrow {\mathbb {R}}\) between points \(u,q\in \mathcal {U}\) is given by
The injectivity radius \(i_u\) at a point \(u \in \mathcal {U}\) is defined as \(i_u \ {:}{=}\ \sup \{r > 0 :\exp _u \vert _{B_r(0_u)} \text { is a diffeomorphism}\}\), where \(0_u\) denotes the zero element of \(T_u \mathcal {U}\) and \(B_r(0_u) \subset T_u \mathcal {U}\) is a ball centered at \(0_u\in T_u \mathcal {U}\) with radius r. The injectivity radius of the manifold \(\mathcal {U}\) is the number \(i(\mathcal {U}) \;{:}{=}\; \inf _{u \in \mathcal {U}} i_u.\)
The triple \((\varOmega , \mathcal {F}, {\mathbb {P}})\) denotes a (complete) probability space, where \(\mathcal {F} \subset 2^{\varOmega }\) is the \(\sigma \)algebra of events and \({\mathbb {P}}:\varOmega \rightarrow [0,1]\) is a probability measure. The expectation of a random variable \(X:\varOmega \rightarrow {\mathbb {R}}\) is defined by \({\mathbb {E}}\left[ X \right] = \int _\varOmega X(\omega )\,\textrm{d}{\mathbb {P}}(\omega )\). A filtration is a sequence \(\{ \mathcal {F}_n\}\) of sub\(\sigma \)algebras of \(\mathcal {F}\) such that \(\mathcal {F}_1 \subset \mathcal {F}_2 \subset \cdots \subset \mathcal {F}\). If for an event \(F \in \mathcal {F}\) it holds that \({\mathbb {P}}(F) = 1\), then we say F occurs almost surely (a.s.). Given an integrable random variable \(X :\varOmega \rightarrow {\mathbb {R}}\) and a sub\(\sigma \)algebra \(\mathcal {F}_n\), the conditional expectation is denoted by \({\mathbb {E}}\left[ X  \mathcal {F}_n \right] \), which is a random variable that is \(\mathcal {F}_n\)measurable and satisfies \(\int _A {\mathbb {E}}\left[ X\mathcal {F}_n \right] (\omega ) \,\textrm{d}{\mathbb {P}}(\omega ) = \int _A X(\omega ) \,\textrm{d}{\mathbb {P}}(\omega )\) for all \(A \in \mathcal {F}_n\).
We will frequently use the convention \(\varvec{\xi } \in \varXi \) to denote a realization (i.e., the deterministic value \(\varvec{\xi }(\omega ) \in \varXi \) for some \(\omega \)) of the vector \(\varvec{\xi }:\varOmega \rightarrow \varXi \subset {\mathbb {R}}^m\); based on the context, there should be no confusion as to whether a realization or a random vector is meant. Let \(J:\mathcal {U} \times {\mathbb {R}}^m \rightarrow {\mathbb {R}}\) be a parametrized objective as in problem (P) and define \(J_{\varvec{\xi }}\;{:}{=}\; J(\cdot ,\varvec{\xi })\). The gradient \(\nabla _{u} J(u,\varvec{\xi })\;{:}{=}\; \nabla J_{\varvec{\xi }}(u)\) of J with respect to u is defined by the relation
Following [14], if \(\nabla _{u} J:\mathcal {U} \times {\mathbb {R}}^m \rightarrow T_{u} \mathcal {U}\) is \({\mathbb {P}}\)integrable, Eq. (2) is fulfilled for all u almost surely, and \({\mathbb {E}}\left[ \nabla _{u} J(u,\varvec{\xi })\right] = \nabla j(u)\), we call \(\nabla _{u} J\) a stochastic gradient.
Let the Lagrangian for problem (P) be the mapping \(\mathcal {L}:\mathcal {U} \times {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) defined by
The gradient \(\nabla h_i(u) \in T_{u} \mathcal {U}\) of \(h_i :\mathcal {U} \rightarrow {\mathbb {R}}\) is defined by the relation \(((h_i)_{*})_{u} w = \mathcal {G}_u(\nabla h_i(u),w)\) for all \(w \in T_{u} \mathcal {U}.\) The gradient of the corresponding vector \(\varvec{h}:\mathcal {U} \rightarrow {\mathbb {R}}^n\) is the vector \(\nabla \varvec{h} (u)= (\nabla h_1(u), \dots , \nabla h_n(u))^\top \).
In the following, we define a Karush–Kuhn–Tucker (KKT) point.
Definition 2.1
The pair \((\hat{u},\hat{\varvec{\lambda }})\in \mathcal {U} \times {\mathbb {R}}^n\) is called a KKT point for problem (P) if it satisfies the following conditions:
Remark 2.1
In order for the aboveformulated KKT conditions to be necessary optimality conditions for problem (P), certain constraint qualifications are required. Analogues of linear independence (LICQ), MangasarianFromovitz, Abadie, and Guignard constraint qualifications have only recently been treated in finitedimensional manifolds; see [6, 50]. The investigation of proper constraint qualifications for the infinite dimensional setting is an open area of research and is not further pursued in this paper. In Theorem 2.2, we will see that in certain cases, our method produces KKT points in the limit. However, in the absence of constraint qualifications, it can only be shown that certain asymptotic KKT conditions (AKKT) are satisfied, in general. This is discussed in more detail in Sect. 2.4.
The closed cone corresponding to the constraints, the distance to the cone, and the projection are defined, respectively, by
For \(y \in {\mathbb {R}}\), the projection onto the ith component of the closed cone \(\varvec{K}\) has the formula \(\pi _{K_i}(y) = 0\) if \(i \in \mathcal {E}\), and \(\pi _{K_i}(y) = \min (0,y)\) if \(i \in \mathcal {I}\). We have \(\pi _{\varvec{K}}(\varvec{y}) = (\pi _{K_1}(y_1), \dots , \pi _{K_n}(y_n))^\top .\) The normal cone of \(\varvec{K}\) in a point \(\varvec{s} \in \varvec{K}\) is defined by \(N_{\varvec{K}}(\varvec{s})= \{ \varvec{v} \in {\mathbb {R}}^n:\varvec{v}^\top (\varvec{s} \varvec{y}) \ge 0 \,\, \forall \varvec{y} \in \varvec{K}\}\); the normal cone is the empty set if \(\varvec{s}\) is not contained in \(\varvec{K}\). To define the augmented Lagrangian, we first introduce a slack variable \(\varvec{s} \in {\varvec{K}}\) to obtain the equivalent, equalityconstrained problem
The corresponding augmented Lagrangian for a fixed parameter \(\mu \) is the mapping \(\mathcal {L}_A^{\varvec{s}}:\mathcal {U} \times {\mathbb {R}}^n \times {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) defined by
Notice that \(\min _{\varvec{s} \in \varvec{K}} \left\ \varvec{h}(u) + \tfrac{\varvec{\lambda }}{\mu }  \varvec{s}\right\ _2^2 = {\text {dist}}_{\varvec{K}}(\varvec{h}(u)+\tfrac{\varvec{\lambda }}{\mu })^2.\) Hence, it is possible to eliminate the slack variable to obtain, again for fixed \(\mu \), the augmented Lagrangian \(\mathcal {L}_A:\mathcal {U} \times {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) defined by
2.2 Augmented Lagrangian Method on Riemannian Manifolds
In this section, we present Algorithm 1, which relies on stochastic approximation. For this, we need the function \(L_A :\mathcal {U} \times {\mathbb {R}}^n \times \varXi \rightarrow {\mathbb {R}}\) defined by
The stochastic augmented Lagrangian (AL) method is shown in Algorithm 1. The inner loop is an adaptation of the randomized minibatch stochastic gradient (RSG) method from [19]. In deterministic AL methods, the inner loop is in practice only solved up to a given error tolerance, leading to an inexact augmented Lagrangian method. Deterministic termination conditions for the inner loop typically rely on conditions of the following type: \(u^{k+1}\) is chosen as the first point of the corresponding iterative procedure satisfying
with the error disappearing asymptotically, i.e., \(\varepsilon _k \rightarrow 0\) as \(k \rightarrow \infty \). Stochastic methods like the kind used here can only provide probabilistic error bounds; termination conditions are based on a priori estimates and result in stochastic errors. The outer loop corresponds to the augmented Lagrangian (AL) method with a safeguarding procedure as described in [21]; see also [47]. A feature of this procedure is that instead of using the Lagrange multiplier \(\varvec{\lambda }\) in the subproblem in line 4, one chooses a function \(\varvec{w}\) from a bounded set B, which is essential for achieving global convergence. In practice, this should be chosen in such a way so that the projection is easy to compute, i.e., box constraints are appropriate. A natural choice is \(\varvec{w}^k\;{:}{=}\; \pi _B(\varvec{\lambda }^k)\) for a closed and convex set B. For the algorithm, we define an infeasibility measure and its induced sequence by
2.3 Convergence of Inner Loop
To prove convergence of the RSG procedure in Algorithm 1, we make the following assumptions about the manifold, which are adapted from [14].
Assumption 1
We assume that (i) the distance \(\textrm{d}(\cdot ,\cdot )\) is nondegenerate,
(ii) the manifold \((\mathcal {U},\mathcal {G})\) has a positive injectivity radius \(i(\mathcal {U})\), and
(iii) for all \(u\in \mathcal {U}\) and all \(\tilde{u} \in \exp _{\varvec{u}}(B_{i_{u}}(0_{u}))\), the minimizing geodesic between u and \(\tilde{u}\) is completely contained in \(B_{i_{u}}(0_{u})\).
As pointed out in [14], the conditions in Assumption 1, while mild for finitedimensional manifolds, are strong for infinitedimensional manifolds. Distances on an infinitedimensional Riemannian manifold can be degenerate. For example, [37] shows that the reparametrizationinvariant \(L^2\)metric on the infinitedimensional manifold of smooth planar curves induces a geodesic distance equal to zero. Any assumption regarding the injectivity radius is challenging to prove in practice. In infinite dimensions, Riemannian metrics are generally weak, such that gradients may not exist. For certain metrics, the exponential map may not be welldefined; it may even fail to be a diffeomorphism on any neighborhood, cf., e.g., [12].
In the following, a function \(g :\mathcal {U} \rightarrow {\mathbb {R}}\) is called \(L_g\)Lipschitz continuously differentiable if the function is \(\mathcal {C}^1\) and there exists a constant \(L_g>0\) such that for all \(u, \tilde{u} \in \mathcal {U}\) with \(\textrm{d}(u,\tilde{u})< i(\mathcal {U})\) we have
where \(P_{1,0}:T_{\gamma (1)}\mathcal {U} \rightarrow T_{\gamma (0)}\mathcal {U}\) is the parallel transport along the unique geodesic such that \(\gamma (0) = u\) and \(\gamma (1) = \tilde{u}.\)
Assumption 2

(i)
The functions j and \({h}_i\) (\(i=1, \dots , n\)) are \(L_j\)Lipschitz and \(L_{{h}_i}\)Lipschitz continuously differentiable and the gradients \(\nabla j\) and \(\nabla {h}_i\) (\(i=1, \dots , n\)) exist for all \(u \in \mathcal {U}\).

(ii)
The function J is continuously differentiable with respect to the first argument for every \(\varvec{\xi } \in \varXi \), the stochastic gradient \(\nabla _{u} J\) defined by (2) exists, and there exists \(M>0\) such that:
$$\begin{aligned} {\mathbb {E}}\left[ \left\ \nabla _{u} J(u,\varvec{\xi } )  \nabla j(u)\right\ _{\mathcal {G}}^2 \right] \le M^2 \quad \forall u \in \mathcal {U}. \end{aligned}$$(4)
We begin our investigations with the following useful property.
Lemma 2.1
Under Assumption 1 and assuming the gradients \(\nabla j\) and \(\nabla {h}_i\) (\(i=1, \dots , n\)) exist, the iterates of Algorithm 1 satisfy
Proof
We have \(\nabla {\text {dist}}_{\varvec{K}}^2 = 2(Id _{{\mathbb {R}}^n} \!\! \pi _{\varvec{K}})\); see [5, Corollary 12.31]. Let \(f(u)\!\;{:}{=}\;\! \mathcal {L}_A(u,\)\(\varvec{w}; \mu )\). Then, the chain rule yields
From this, thanks to the identity (1), it follows that
and using the definition of \(\varvec{\lambda }^{k+1}\) from Algorithm 1, we obtain
Using the fact that \(\nabla _{u} \mathcal {L}(u,\varvec{\lambda }) = \nabla j(u) + \nabla \varvec{h}({u})^\top \varvec{\lambda }\), we have proven the claim. \(\square \)
Now, we turn to an efficiency estimate for the inner loop. First, we define the functions
Recall the convention \(\varvec{\xi } \in \varXi \) being used in the definition of \(F_k\) and \(\varvec{\xi } :\varOmega \rightarrow \varXi \) being used in the definition of \(f_k\).
Lemma 2.2
Suppose that Assumptions 1 and 2 are satisfied and let \(\hat{B}_k \subset \mathcal {U}\) be a bounded set such that \(\textrm{d}(\tilde{u},u) \le i(\mathcal {U})\) for all \(\tilde{u}, {u} \in \hat{B}_k.\) Then, \(f_k\) is \(L_k\)Lipschitz continuously differentiable with \(L_k\) depending on \(L_j, L_{{h}_1}, \dots , L_{{h}_n}\), and \(\hat{B}_k\). Moreover, for all \(\tilde{u},u \in \hat{B}_k\) with \(v\;{:}{=}\; \exp _{u}^{1}(\tilde{u})\), we have
Proof
Let \(P_{1,0}\) denote the parallel transport as defined directly before Assumption 2 and set \(g_i({u})\;{:}{=}\; h_i({u}) + \frac{w^k_i}{\mu _k}\pi _{K_i}(h_i({u}) + \frac{w^k_i}{\mu _k}).\) Since \(h_i\) is \(L_{h_i}\)Lipschitz continuously differentiable and \(\hat{B}_k\) is bounded, there exists \(C_{i,k}>0\) such that \(\left\ \nabla h_i(u)\right\ _{\mathcal {G}} \le C_{i,k}.\) Now, we have
where in the last step, we used the contraction property of the projection operator. Notice that
for some \(C'_i>0\) since \(h_i\) is \(\mathcal {C}^1\). Additionally, we have
and
Since \(\hat{B}_k\) is bounded, (8) and (9) together imply that there exists \(C_{i,k}'' >0\) such that \(\left g_i(u) \right \le C_{i,k}''\). As a consequence of (6) and (7), we have
\(\tilde{L}_{\varvec{h},k}\;{:}{=}\; \sum _{i=1}^n L_{{h}_i} C_{i,k}''+ 2 C_{i,k}C_i'\), we have
Therefore, \(f_k\) is \(L_k\)Lipschitz with \(L_k\;{:}{=}\; L_j +\mu _k \tilde{L}_{\varvec{h},k}\). Applying [14, Theorem 2.6], we obtain (5). \(\square \)
Remark 2.2
In the previous lemma, we introduced a bounded set \(\hat{B}_k\). For the following results, we will need the existence of these sets containing the iterates almost surely within each k. Conditions ensuring boundedness can, e.g., be guaranteed by including constraints of the form \(u \in C\subset \mathcal {U} \) for some bounded set C, or growth conditions on the gradient in combination with a regularizer; see [16].
Our first result concerning the convergence of Algorithm 1 handles the efficiency of the inner loop process, which corresponds to a stochastic gradient method that is randomly stopped after \(R_k\) iterations. We follow the arguments in [19, Corollary 3]. It is possible to choose nonconstant step sizes \(t_{k_j}\); see [19, Theorem 2], but for the sake of clarity we observe step sizes that are constant in the inner loop here.
To handle the analysis, we interpret \(R_k\) as a realization of a stopping time \(\tau _k :\varOmega \rightarrow \{ 1, \dots , N_k\}\). Let \(\varvec{\xi }^{k,j}\;{:}{=}\; (\varvec{\xi }^{k,j,1}, \dots ,\varvec{ \xi }^{k,j,m_k})\) be the batch associated with iteration j for a given outer loop k and let \(\mathcal {F}_{k,n} = \sigma (\varvec{\xi }^{\ell ,i}:\ell \in \{1, \dots , k\}, i \in \{ 1, \dots , n \}) \) define the corresponding natural filtration. We define the filtration associated with the randomly stopped stochastic process by \(\mathcal {F}^{\tau _k} =\{ \mathcal {F}_{\ell ,n \wedge \tau _k}:\ell \in \{ 1, \dots , k\}, n \in \{ 1, \dots , N_k\} \}\).
Theorem 2.1
Suppose Assumptions 1 and 2 are satisfied. Observe a fixed iteration k from Algorithm 1. Suppose the iterates \(\{ \varvec{z}^{k,j}\}\) are a.s. contained in a bounded set \(\hat{B}_k \subset \mathcal {U}\), where \(\textrm{d}(u,\tilde{u})\le i(\mathcal {U})\) for all \(u,\tilde{u}\in \hat{B}_k\). Then, if the step size \(t_{k}\) satisfies \(t_{k} =\alpha _k/{L_k}\) for \(\alpha _k \in (0,2)\) and all k, we have
where \(f_k^*\;{:}{=}\; \inf _{u \in \hat{B}_k} f_k(u)\). Moreover, if \(\hat{B}_\infty \;{:}{=}\; \cup _{k=1}^\infty \hat{B}_k\) is bounded, \(\textrm{d}(u, \tilde{u}) \le i(\mathcal {U})\) for all \(u, \tilde{u} \in \hat{B}_\infty \), the maximum iterations \(\{ N_k\}\) are chosen such that \(N_k = \beta _k L_k\) for \(\beta _k >0\), and
then we have \(\left\ \nabla f_k(u^{k+1})\right\ _{\mathcal {G}} \rightarrow 0\) a.s. as \(k\rightarrow \infty \).
Proof
Let k be fixed. We define \(\delta ^{j}\;{:}{=}\; \frac{1}{m_k}\sum _{i=1}^{m_k} \nabla _{u}F_k(z^{k,j},\varvec{\xi }^{k,j,i}) \nabla f_k(z^{k,j}).\) With \(v^j\;{:}{=}\; \exp _{z^{k,j}}^{1}(z^{k,j+1}) =  \frac{1}{L_k m_k}\sum _{i=1}^{m_k} \nabla _{u}F_k(z^{k,j},\varvec{\xi }^{k,j,i})\), Lemma 2.2 yields
Taking the sum with respect to j on both sides and rearranging, we obtain
since \(f_k^* \le f_k(z^{k,N_k+1})\) and \(0< \alpha _k < 2\). Since \(\nabla _{u}F_k\) is a stochastic gradient, we have \({\mathbb {E}}\left[ \mathcal {G}(\nabla f_k(z^{k,j}),\delta ^j)\mathcal {F}_{k,j} \right] =\mathcal {G} \left( \nabla f_k(z^{k,j}),{\mathbb {E}}\left[ \delta ^j\mathcal {F}_{k,j} \right] \right) =0.\) Notice that due to (4), we have
With (13), we obtain
where we used Jensen’s inequality, the linearity of the expectation, and (13). Taking the expectation on both sides of (14), using (12), and using the tower rule, cf. [20, Proposition 1.1 (a), p. 471], we get the inequality
Due to the law of total expectation, we have
Note that \(f_k(z^{k,R_k}) = f_k(u^{k+1})\) and \(f_k(z^{k,1})=f_k(u^k)\). Returning to (15), we obtain
so we have shown (10).
Now, to prove almost sure convergence, we first observe that if all iterates are contained in \(\hat{B}_\infty \), we have
for some \(C>0\) due to the assumed smoothness of \(f_k\) on \(\mathcal {U}\). Taking the total expectation of (10), Markov’s inequality in combination with Jensen’s inequality gives
Since \(N_k=\beta _k L_k\) and (11) holds, the infinite sum of the righthand side is finite for every \(\varepsilon >0\), implying the almost sure convergence of \(\big \{ \left\ \nabla f_k(u^{k+1})\right\ _{\mathcal {G}} \big \}\) to zero. \(\square \)
For the choice \(t_k = 1/L_k\) and (16), the efficiency estimate (10) evidently simplifies to \({{\mathbb {E}}} \left[ \left\ \nabla f_k(u^{k+1}) \right\ _{\mathcal {G}}^2 \right] \le \frac{2L_k C}{N_k}+ \frac{M^2}{m_k}.\) In the next section, we will investigate optimality of the solution in the limit as k is taken to infinity. Since the Lipschitz constant \(L_k\) has a potential to be unbounded due to the penalty term \(\mu _k\), the maximal number of iterations \(N_k\) needs to be balanced appropriately in this case. To obtain almost sure convergence, we required \(N_k = \beta _k L_k\) for \(\beta _k >0\). Alternatively, if it can be guaranteed that \(L_k\) is bounded for all k (for instance by bounding \(\mu _k\)), then one could (asymptotically) choose \(t_k = \alpha _k/L\) with \(L=\sup _{k} L_k\). Regarding complexity, it is possible to establish the inner loop’s complexity as argued in [19, Section 4.2]. We define a \((\varepsilon _k, \eta _k)\)solution to the problem \(\min _{u \in \mathcal {U}} \, \{ f_k(u)= {\mathbb {E}}\left[ F_k(u,\varvec{\xi }) \right] \} \) as the point \(\hat{u}\) that satisfies \({{\mathbb {P}}} \big \{ \left\ \nabla f_k(\hat{u}) \right\ _{\mathcal {G}}^2 \le \varepsilon _k \big \} \ge 1  \eta _k.\) Ignoring some constants, for the choice \(t_k=1/L_k\), the complexity can be bounded by \(\mathcal {O} \left( (\eta _k \varepsilon _k)^{1} + M^2 \eta _k^{2} \varepsilon _k^{2} \right) .\)
2.4 Convergence of Outer Loop
In the final part of this section, we analyze the behavior of the outer loop of Algorithm 1 adapting arguments from [23, 47]. We define an optimality measure and its induced sequence by
and make the following assumptions on iterates induced by Algorithm 1.
Assumption 3
We assume that

(i)
the sequence \(\{ u^k\}\) is a.s. contained in a bounded set \(\hat{B}_\infty \) such that \(d (u,\tilde{u}) \le i(\mathcal {U})\) for all \(u,\tilde{u} \in \hat{B}_\infty \),

(ii)
\(\left\ \nabla _{u}\mathcal {L}_A(u^{k+1},\varvec{w}^k;\mu _k)\right\ _{\mathcal {G}}\rightarrow 0\) a.s. as \(k\rightarrow \infty \),

(iii)
\(\{(u^k,\varvec{\lambda }^k)\}\) converges a.s. to the set of KKT points and

(iv)
for k sufficiently large, we have \(\varvec{w}^k = \varvec{\lambda }^k\).
Note that Theorem 2.1 implies Assumption 3(ii). Assumption 3(iii) requires that every limit point of every realization of the sequence \(\{u^k, \varvec{\lambda }^k\}\) is a KKT point. In the absence of constraint qualifications, one can still work with asymptotic KKT (AKKT) conditions; under certain conditions, it can even be shown that they are necessary conditions (see, e.g., [23, Theorem 5.3]). We will say that a feasible point \(\hat{u}\) satisfies the AKKT conditions if there exists a sequence \(\{ u^k\}\) such that \(\textrm{d}(u^k,\hat{u})\rightarrow 0\) and there exists a sequence \(\{\varvec{\lambda }^k \}\) contained in the dual cone \(\varvec{K}^{\oplus }\;{:}{=}\; \{ \varvec{y} \in {\mathbb {R}}^n:\varvec{y}^\top \varvec{k} \ge 0 \, \forall \varvec{k} \in \varvec{K}\}\) such that
as \(k \rightarrow \infty .\)
A fundamental difference in the stochastic variant of the augmented Lagrangian method is that limit points, as limits of the stochastic process \((u^k, \varvec{\lambda }^k)\), are random. In the following, we will consider a fixed limit point \((\hat{u},\hat{\varvec{\lambda }})\) and the corresponding set of paths converging to it. This motivates the definition of the set
Note that here, and in the following analysis, \(\omega \) represents an outcome of the random process \((\varvec{\xi }^{1,1}, \dots , \varvec{\xi }^{1,R_1}, \varvec{\xi }^{2,1}, \dots , \varvec{\xi }^{2,R_2}, \dots )\) induced by sampling and random stopping.
Theorem 2.2
Suppose Assumptions 1–3(i)(ii) are satisfied. Let \(E\;{:}{=}\; \{ \omega \in \varOmega :\mu _k(\omega ) \text { is a.s.~bounded}\}\). Then, \(\{ \varvec{\lambda }^k(\omega )\}\) is a.s. bounded on E and any limit point \((\hat{u}, \hat{\varvec{\lambda }})\) of \(\{(u^k(\omega ),\varvec{\lambda }^k(\omega )):\omega \in E, k \in {\mathbb {N}}\}\) is a KKT point. On the set \(\varOmega \backslash E\), if a limit point \(\hat{u}\) is feasible, then it is an AKKT point.
Proof
We will make arguments in two parts, where we distinguish between the case of bounded and unbounded \(\mu _k\).
Case 1: Bounded \(\mu _k\). We first show that the sequence \(\{ \varvec{\lambda }^k\}\) is a.s. bounded. Let \(\varvec{v}^{k+1}\;{:}{=}\; \varvec{h}(u^{k+1}) +\frac{\varvec{w}^k}{\mu _k}\) and \(\varvec{y}^{k+1} \;{:}{=}\; \pi _{\varvec{K}}(\varvec{v}^{k+1})\). By definition of \(\varvec{\lambda }^{k}\), we have
Now, observe that the boundedness of \(\{ \mu _k\}\) on E implies that there exists a maximal iterate \(\bar{k}\) in Algorithm 1 such that \(H_{k+1}\le \tau H_k \le \tau \tilde{M}\) is satisfied for every \(k \ge \bar{k}\) and some \(\tilde{M}>0\). This \(\tilde{M}\) exists since \(\varvec{h}\) is \(\mathcal {C}^1\) and \(u^k\), \(\varvec{w}^k\), and \(\mu _k\) are all bounded by assumption. In particular, \(H_k \rightarrow 0\) as \(k\rightarrow \infty \) on E. In turn, (19) combined with the definition of \(H_k\) implies the a.s. convergence of \(\Vert \varvec{\lambda }^{k+1}\varvec{w}^k\Vert _2/\mu _k\) to zero, in turn implying \(\Vert \varvec{\lambda }^{k+1}\varvec{w}^k\Vert _2 \rightarrow 0\) for \(k\rightarrow 0\). The boundedness of \(\varvec{w}^k\) guaranteed by Algorithm 1 means therefore that \(\{ \varvec{\lambda }^k\}\) is bounded on E.
Now, we prove that for any \(\varvec{y} \in \varvec{K}\), there exists a nonnegative sequence \(\gamma _k\) converging to zero such that
With [5, Theorem 3.14], the projection formula
holds for all \(\varvec{y} \in {\varvec{K}}\), implying that \(\varvec{\lambda }^{k+1}=\mu _{k+1}(\varvec{v}^{k+1}\varvec{y}^{k+1}) \in N_{\varvec{K}}(\varvec{y}^{k+1}).\) Now, using \(\varvec{\lambda }^{k+1} \in N_{\varvec{K}}(\varvec{y}^{k+1})\) and (19), we have
We have shown (20). That \(\{\gamma _k\}\) is a.s. a null sequence follows from the fact that \(\Vert \varvec{\lambda }^{k+1}\varvec{w}^k\Vert _2/\mu _k\) a.s. converges to zero.
Consider a subsequence of \(\{ (u^k(\omega ),\varvec{\lambda }^k(\omega ))\}\) that converge to a limit point \((\hat{u},\hat{\varvec{\lambda }})\) for a fixed \(\omega \in E_{\hat{u}, \hat{\varvec{\lambda }}}\). We will prove that the limit point satisfies the KKT conditions (3). Continuity of \(\nabla _{u} \mathcal {L}\) gives \(\lim _{k \rightarrow \infty } \nabla _{u} \mathcal {L}(u^k(\omega ),\varvec{\lambda }^k(\omega )) = \nabla _{u}\mathcal {L}(\hat{u}, \hat{\varvec{\lambda }})\) and \(\big \Vert \nabla _{u} \mathcal {L}(\hat{u}, \hat{\varvec{\lambda }}) \big \Vert _{\mathcal {G}} = 0\) due to Assumption 3(ii). By definition, \(\nabla _{u} \mathcal {L}(\hat{u},\hat{\varvec{\lambda }}) \in T_{\hat{u}}\mathcal {U}\), and the only element in \(T_{\hat{u}}\mathcal {U}\) having norm zero is \(0_{\hat{u}}\), thus (3a) is fulfilled. Since \(\alpha _k \rightarrow 0\) a.s., we have that \((\varvec{y}\varvec{h}(\hat{u}))^\top \hat{\varvec{\lambda }} \le 0\) for all \(\varvec{y} \in \varvec{K},\) implying that \(\hat{\varvec{\lambda }} \in N_{\varvec{K}}(\varvec{h}(\hat{u}))\). This immediately implies (3b)–(3c).
Case 2: Unbounded \(\mu _k\). Consider a fixed \(\omega \in \varOmega \backslash E\) and a sequence \(\{ u^k(\omega )\}\) such that (possibly on a subsequence that we do not relabel) \(\textrm{d}(u^k(\omega ), \hat{u}) \rightarrow 0\) as \(k\rightarrow \infty \). Assumption 3(ii) gives the first AKKT condition in (17). It remains to prove that \(\pi _{\varvec{K}} (\varvec{h}(u^k(\omega )))^\top \varvec{\lambda }^k(\omega ) \rightarrow 0\). Now, we define
For readability, we will suppress the dependence on \(\omega \). Since
it is evidently enough to prove \(\varvec{p}^k \rightarrow 0\), since due to the contraction property of the projection, we have \(\pi _{\varvec{K}}(\varvec{a}^k)^\top \varvec{b}^k \rightarrow 0\) implies \(\pi _{\varvec{K}}(\varvec{a}^k)^\top \pi _{\varvec{K}}(\varvec{b}^k) \rightarrow 0\) for any \(\varvec{a}^k, \varvec{b}^k \in {\mathbb {R}}^n.\) Note that at least on a subsequence, we have \(\varvec{h}(u^{k+1}) \rightarrow \varvec{h}(\hat{u})\) and \(\left \varvec{h}(u^{k}) \right \) is bounded.
Consider first the case that \(h_i(\hat{u})<0\). Then \(\varvec{h}(u^{k+1})\rightarrow \varvec{h}(\hat{u})\) implies that \(w_i^k+\mu _k h_i(u^{k+1})<0\) for k sufficiently large, implying \(\varvec{p}^k \rightarrow 0\).
Consider now the case that \(h_i(\hat{u})=0\). For a fixed k, if \(h_i(u^{k+1}) \ge 0\) then \(\varvec{p}^k=0\). Else if \(h_i(u^{k+1})<0\), then \(p_i^k = (\mu _k h_i(u^{k+1}) +w_i^k) \pi _{\varvec{K}}(h_i(u^{k+1})) \le w_i^k \left h_i(u^{k+1}) \right \). If \(h_i(u^{k+1})<0\) infinitely many times, then \(w_i^k \left h_i(u^{k+1}) \right \rightarrow 0\), meaning \(\varvec{p}^k \rightarrow 0\).
Since \(\varvec{p}^k\) in both cases converges to zero and \(\omega \in \varOmega \backslash E\) was arbitrary, we have proven the claim. \(\square \)
We now turn to local convergence statements. In the spirit of a local argument, we restrict our investigations to the study around a limit point for only those realizations converging to it. Again, we consider the set \(E_{\hat{u},\hat{\varvec{\lambda }}}\) defined in (18).
Lemma 2.3
Suppose Assumptions 1–3 hold. Let \((\hat{u}, \hat{\varvec{\lambda }})\) be a limit point satisfying for some \(c_1, c_2 >0\)
for all \((u, \varvec{\lambda })\) with u near \(\hat{u}\) and \(r(u,\varvec{\lambda })\) sufficiently small. Then we have for sufficiently large k
Proof
We have using Lemma 2.1 and \(\varvec{w}^k = \varvec{\lambda }^k\) that
Let \(\varvec{v}^{k+1}\;{:}{=}\; \varvec{h}(u^{k+1}) +\frac{\varvec{w}^k}{\mu _k}\) and \(\varvec{y}^{k+1} \;{:}{=}\; \pi _{\varvec{K}}(\varvec{v}^{k+1})\). Then it follows that
since \(\varvec{\lambda }^{k+1} \in N_{\varvec{K}}(\varvec{y}^{k+1})\) as argued in Part 1 of the proof of Theorem 2.2. Note that \(Id _{{\mathbb {R}}^n}  \pi _{\varvec{K}}\) is (firmly) nonexpansive (cf. [5, Prop. 12.27]). It is an easy exercise to deduce that the mapping \(\varvec{y} \mapsto \varvec{y}  \pi _{\varvec{K}}(\varvec{y}+\varvec{\lambda }^{k+1})\) is nonexpansive as well, from which we can conclude
Using the definition of \(\varvec{y}^{k+1}\) and \(\varvec{w}^k=\varvec{\lambda }^k\), notice that
Returning to (22), we obtain
Since \(\lim _{k\rightarrow \infty } \textrm{d}(u^k, \hat{u}) = 0\) a.s. on \(E_{\hat{u}, \hat{\varvec{\lambda }}}\), then for any \(\varepsilon >0\) there exists \(\bar{k}\) such that \(\textrm{d}(u^k, \hat{u}) < \varepsilon \) for all \(k\ge \bar{k}\) a.s. Possibly choosing \(\bar{k}\) even larger, Assumption 3 combined with the positive injectivity radius further implies \(\big \Vert \varvec{\lambda }^k(\omega )  \hat{\varvec{\lambda }}\big \Vert _2 \le c_2 r_k\) for almost all \(\omega \in E_{\hat{u}, \hat{\varvec{\lambda }}}\). Using (23), we conclude that for almost all \(\omega \in E_{\hat{u}, \hat{\varvec{\lambda }}}\),
for k large enough. Rearranging terms proves the claim. \(\square \)
We are now ready to show the local rate of convergence. We recall the definition of convergence for the convenience of the reader: A sequence \(\{r_k\}\) that converges to \(r^*\) is said to have order of convergence \(s \ge 1\) and rate of convergence q if
Linear convergence occurs in the case \(s=1\) and \(q\in (0,1)\). Moreover, superlinear convergence occurs in all cases where \(q>1\) and the case where \( s=1\) and \(q=0\).
Theorem 2.3
Under the same assumptions as Lemma 2.3, assume further that \(\big \Vert \nabla _{\varvec{u}} \mathcal {L}_A(\varvec{u}^{k+1},\varvec{\lambda }^{k};\mu _k)\big \Vert _{\mathcal {G}} = o(r_k).\) Then

1)
Given the existence of \(\hat{\mu }_q >0\) such that if \(\mu _k \ge \hat{\mu }_q\) for k sufficiently large, \(\{(\varvec{u}^k, \varvec{\lambda }^k)\}\) converges linearly to \((\hat{\varvec{u}}, \hat{\varvec{\lambda }})\) a.s. on \(E_{\hat{\varvec{u}},\hat{\varvec{\lambda }}}\) with convergence rate \(q\in (0,1)\).

2)
If \(\mu _k \rightarrow \infty \), then \((\varvec{u}^k, \varvec{\lambda }^k) \rightarrow (\hat{\varvec{u}}, \hat{\varvec{\lambda }})\) a.s. on \(E_{\hat{\varvec{u}},\hat{\varvec{\lambda }}}\) at a superlinear rate.
Proof
Note that for k large enough, we have \(\varvec{w}^k = \varvec{\lambda }^k\) and Lemma 2.3 gives
Taking \(\mu _k\) such that \(\mu _kc_2>0\) gives \(r_{k+1} \le \frac{\mu _k}{\mu _k  c_2} \left( o(r_k) + \frac{c_2}{\mu _k} r_k \right) .\) This implies
Thanks to the error bound (21), we get the corresponding rates for \(\{(\varvec{u}^k, \varvec{\lambda }^k)\}\). \(\square \)
In practice, the assumption \(\big \Vert \nabla _{\varvec{u}} \mathcal {L}_A(\varvec{u}^{k+1},\varvec{\lambda }^{k};\mu _k)\big \Vert _{{\mathcal {G}}} = o(r_k)\) is difficult to implement since one can only work with estimates \(\hat{f}_k \approx {\mathbb {E}}\big [L_A(\varvec{u}^{k+1}, \varvec{\lambda }^k, \varvec{\xi }; \mu _k) \big ] = \mathcal {L}_A(\varvec{u}^{k+1}, \varvec{\lambda }^k; \mu _k).\) However, we have a convergence rate guaranteed in expectation by (10), which can be used to choose appropriate sequences for \(N_k\) and \(m_k\). A possible heuristic is shown in the following section.
3 Application and Numerical Results
In this section, we present an application to a twodimensional fluidmechanical problem to demonstrate the algorithm. We denote the holdall domain as \(D=D(\varvec{u})\), which is partitioned into \(N+1\) disjoint subdomains \(D_1, \ldots , D_{N+1}\), where \(D_{N+1}\) represents the subdomain in which fluid is allowed to flow, and the other sets are obstacles around which the fluid is supposed to flow. The subdomain boundaries are defined as \(\partial D_1 = u_1\), \(\ldots \), \(\partial D_N = u_N\), and \(\partial D_{N+1} = \varGamma \cup u_1 \cup \cdots \cup u_N \), where \(\varGamma \) is the outer boundary that is fixed and split into two disjoint parts \( \varGamma _D\) and \(\varGamma _N\) representing the Dirichlet and Neumann boundary, respectively.
In [15], a shape is seen as a point on an abstract manifold so that a collection of shapes can be viewed as a vector of points \(\varvec{u}=(u_1, \dots , u_N)\) in a product manifold \(\mathcal {U}^N = \mathcal {U}_1 \times \cdots \times \mathcal {U}_N\), where \(\mathcal {U}_i\) are Riemannian manifolds for all \(i=1,\dots ,N\). In the following, our shapes are the abovementioned obstacles leading to a multishape optimization problem. One should take into account that a product manifold is a manifold and, thus, all theoretical findings from the Sect. 2 can also be applied to product manifolds. We will work with a (possibly infinitedimensional) connected Riemannian product manifold \((\mathcal {U},\mathcal {G})=(\mathcal {U}^N, \mathcal {G}^N)\). As described in [15], the tangent space \(T\mathcal {U}^N\) can be identified with the product of tangent spaces \(T\mathcal {U}_1\times \cdots \times T\mathcal {U}_N\) via \( T_{\varvec{u}} \mathcal {U}^N \cong T_{u_1} \mathcal {U}_1 \times \cdots \times T_{u_N} \mathcal {U}_N. \) Additionally, the product metric \(\mathcal {G}^N\) to the corresponding product shape space \(\mathcal {U}^N\) can be defined via \(\mathcal {G}^N=(\mathcal {G}^N_{\varvec{u}})_{\varvec{u}\in \mathcal {U}^N}\), where
and \(\pi _i:\mathcal {U}^N\rightarrow \mathcal {U}_i\), \(i=1, \dots , N\), correspond to canonical projections. If we work with multiple shapes \(\varvec{u}\), the exponential map in Algorithm 1 needs to be replaced by the socalled multiexponential map. Let \(V_{\varvec{u}}^N\;{:}{=}\; V_{u_1}\times \cdots \times V_{u_N}\), where \(V_{u_i}\;{:}{=}\; \{v_i\in T_{u_i}\mathcal {U}_i:1\in I_{u_i,v_i}^{\mathcal {U}_i}\}\) for all \(i=1,\dots ,N\). Then, we define the multiexponential map by \(\exp _{\varvec{u}}^N:V_{\varvec{u}}^N\rightarrow \mathcal {U}^N,\, \varvec{v}=(v_1,\dots ,v_N)\mapsto (\exp _{u_1}v_1,\dots ,\exp _{u_N}v_N)\) for the vector \(\varvec{u}=(u_1,\dots ,u_N)\), where \(\exp _{u_i}:V_{u_i}\rightarrow \mathcal {U}_i,\,v_i\mapsto \exp _{u_i}(v_i)\) for all \(i=1,\dots ,N\).
The shape space we consider in the numerical experiments is the product space of plane unparametrized curves, i.e., \(\mathcal {U}^N=B_e^N(S^1,{\mathbb {R}}^2)\). The shape space \(B_e(S^1,{\mathbb {R}}^2)\) is defined as the orbit space of \(\textrm{Emb}(S^1,\mathbb {R}^2)\) under the action by composition from the right by the Lie group \(\textrm{Diff}(S^1)\), i.e., \(B_e(S^1,{\mathbb {R}}^2) \;{:}{=}\; \text {Emb}(S^1,\mathbb {R}^2) / \text {Diff}(S^1)\) (cf., e.g., [37]). Here, \(\textrm{Emb}(S^1,{\mathbb {R}}^2)\) denotes the set of all embeddings from the unit circle \(S^1\) into \({\mathbb {R}}^2\), and \(\textrm{Diff}(S^1)\) is the set of all diffeomorphisms from \(S^1\) into itself. In [28], it is proven that the shape space \(B_e(S^1,{\mathbb {R}}^2)\) is a smooth manifold; together with appropriate inner products, it is even a Riemannian manifold. In our numerical experiments, we choose the Steklov–Poincaré metric defined in [43]. Originally, it is defined as a mapping from Sobolev spaces. To define a metric on \(B_e(S^1,{\mathbb {R}}^2)\), the Steklov–Poincaré metric is restricted to a mapping from the tangent spaces, i.e., \( T_u B_e(S^1,{\mathbb {R}}^2) \times T_u B_e(S^1,{\mathbb {R}}^2) \rightarrow {\mathbb {R}}\), where \(T_uB_e(S^1,\mathbb {R}^{2})\cong \left\{ h:h=\alpha \varvec{n},\, \alpha \in \mathcal {C}^\infty (S^{1})\right\} \). Of course, one can choose a different metric on the shape space to represent the shape gradient. We focus on the Steklov–Poincaré metric due to its advantages in combination with the computational mesh (cf. [43, 46]).
The physical system on D is described by the Stokes equations under uncertainty. Note that here, flow is modeled on the domain D instead of \(D_{N+1}\). This is done (in view of the trackingtype functional) to produce a shape derivative on the entire domain. Let \(V(D) = \left\{ \varvec{q}\in H^1( D, {\mathbb {R}}^2):\varvec{q}\vert _{\varGamma _D\cup \varvec{u}} = \varvec{0} \right\} \) denote the function space associated to the velocity for a fixed domain D. We neglect volume forces and consider a deterministic viscosity of the fluid. Inflow \(\varvec{g}\) on parts of the Dirichlet boundary is assumed to be uncertain and is modeled as a random field \(\varvec{g}:D \times \varXi \rightarrow {\mathbb {R}}^2\) with regularity \(\varvec{g} \in L_{{\mathbb {P}}}^2(\varXi , H^1( D, {\mathbb {R}}^2))\) and depending on \(\varvec{\xi } :\varOmega \rightarrow \varXi \subset {\mathbb {R}}^m\). We will use the abbreviation \(\varvec{g}_{\varvec{\xi }} = \varvec{g}(\cdot ,\varvec{\xi })\). For each realization \(\varvec{\xi }\), consider Stokes flow in weak form: find \(\varvec{q}_{\varvec{\xi }} \in H^1( D, {\mathbb {R}}^2)\) and \(p_{\varvec{\xi }} \in L^2( D)\) such that \(\varvec{q}_{\varvec{\xi }}\varvec{g}_{\varvec{\xi }} \in V(D)\) and
Here, \(\varvec{A}: \varvec{B} = \sum _{j=1}^{d} \sum _{k=1}^{d} A_{j k} B_{j k}\) for two matrices \(\varvec{A},\varvec{B} \in {\mathbb {R}}^{d\times d}\). The gradient and divergence operators \(\nabla \) and \({{\,\textrm{div}\,}}\) act with respect to the spatial variable only with \(\varvec{\xi }\) acting as a parameter.
For each shape \(u_i\), \(i=1,\ldots ,N\), we introduce one inequality constraint for a constrained volume, see Eq. (27a) and one inequality constraint for a constrained perimeter, see Eq. (27b). The volume of the domain \(D_{i}\) is given by \({{\,\textrm{vol}\,}}(D_i) = \int _{D_i} 1 \,\textrm{d} \varvec{x}\) and the perimeter of \(u_i\) is given by \( {{\,\textrm{peri}\,}}(u_i) = \int _{u_i} 1 \,\textrm{d} \varvec{s}.\) Now, we suppose there is a deterministic target velocity \(\bar{\varvec{q}}\) to be reached on the domain D. We would like to determine the optimal placement of shapes that come closest on average to this velocity field. More precisely, we solve the problem
subject to (25) and
We note that a deterministic model using a trackingtype functional in combination with Stokes flow has been studied in [9].
3.1 Shape Derivative
In the following, we compute the shape derivative of the parametrized augmented Lagrangian corresponding to the model problem defined by (25)–(27). We define \(\varvec{h}:B_e^N(S^1,{\mathbb {R}}^2) \rightarrow {\mathbb {R}}^{2 N}\) by
as well as the set \(\varvec{K} \;{:}{=}\; \{ \varvec{h} \in {\mathbb {R}}^{2 N}:h_i \le 0 \,\, \forall i=1, \ldots , 2 N\}\) and the objective \(J(\varvec{u},\varvec{\xi })\;{:}{=}\; \int _{D} \left\ \varvec{q}_{\varvec{\xi }}(\varvec{x}) + \varvec{g}_{\varvec{\xi }}(\varvec{x}) \bar{\varvec{q}}(\varvec{x}) \right\ _2^2 \,\textrm{d} \varvec{x}.\) The parametrized augmented Lagrangian is defined by
Differentiating the Lagrangian (28) with respect to \(\left( \varvec{q},p\right) \) and setting it to zero gives the weak form of the adjoint equation: find \(\varvec{\varphi }_{\varvec{\xi }} \in V(D)\) and \(\psi _{\varvec{\xi }} \in L^2(D)\) such that
We define the space \(\mathcal {W}(D) = \{ \varvec{W} \in H^1(D, {\mathbb {R}}^2):\varvec{W}\vert _{\varGamma }=0 \}\). We have the shape derivative
where \(\left( \varvec{q}_{\varvec{\xi }},p_{\varvec{\xi }}\right) \) and \(\left( \varvec{\varphi }_{\varvec{\xi }},\psi _{\varvec{\xi }}\right) \) solve the state Eq. (25) and adjoint Eq. (29), respectively. The shape derivative is needed to represent the gradient with respect to the metric under consideration (cf., e.g., [15]). As described in [15], we can use the multishape derivative in an “allatonce”approach to compute the multishape gradient with respect to the Steklov–Poincaré metric and the mesh deformation \(\varvec{V}= \varvec{V}_{\varvec{\xi }}\) all at once by solving
where a is a coercive and symmetric bilinear form. The mesh deformation \(\varvec{V}\) calculated from (30) can be viewed as an extension of the multishape gradient \(\varvec{v}\) with respect to the Steklov–Poincaré metric to the holdall domain D (for details we refer the reader to [15]).
The bilinear form that describes linear elasticity is a common choice for a due to the advantageous effect on the computational mesh (cf. [46, 48]), and is selected for the following numerical studies. The Lamé parameters are chosen as \(\hat{\lambda }=0\) and \(\hat{\mu }\) smoothly decreasing from 33 on \(\varvec{u}\) to 10 on \(\varGamma \), as obtained by the solution of Poisson’s equation on D.
To update the shapes according to Algorithm 1, we need to compute the multiexponential map. This computation is prohibitively expensive in most applications because a calculus of variations problem must be solved or the Christoffel symbols need be known. Therefore, we approximate it using a multiretraction
to update the shape vector \(\varvec{z}^{k,j}=(z^{k,j}_1,\dots ,z^{k,j}_N)\) in each pair (j, k). For each shape \(z_i^{k,j}\) we use the retraction in [14, 15, 44]: \(\mathcal {R}_{z_i^{k,j}}:T_{z_i^{k,j}}\mathcal {U}^i \rightarrow \mathcal {U}^i,\, v_i \mapsto z_i^{k,j}+v_i \) for all \(i=1,\dots ,N\).
3.2 Numerical Results
All numerical simulations were performed on the HPC cluster HSUper.^{Footnote 1} using the FEniCS toolbox, version 2019.1.0 [2] and Python 3.10.10. The holdall domain is chosen as \(D=(0,1)^2\). We choose \(N=3\) shapes inside the holdall domain, which can be seen on the lefthand side of Fig. 1. The computational mesh is generated with Gmsh 4.11.1 [17], which yields 265 line elements for the outer boundary and the interfaces, and 3803 triangular elements as the discretization of D. Additionally, a new mesh was automatically generated if the mesh quality.^{Footnote 2} fell below a threshold of \(40\%\). A reevaluation of all relevant values within the optimization (e.g., objective functional and geometrical constraints) after remeshing ensures that optimization can continue to be performed. It has already been observed that this increases the number of optimization iterations (cf., e.g., [40]), but is difficult to avoid due to quickly deteriorating meshes. The target velocity is shown in Fig. 1 on the right, together with the shapes to obtain the target velocity in white. Standard TaylorHood elements are used.
The values of the geometrical constraints were chosen in accordance with the shapes of the target velocity. The volumes of \(D_1\), \(D_2\) and \(D_3\) were constrained to be at or above 0.035295, 0.025397 and 0.036967, and the perimeters of \(u_1\), \(u_2\) and \(u_3\) to be at or below 0.72630, 0.56521 and 0.69796, respectively. The augmented Lagrangian parameters in Algorithm 1 were initialized to \(\varvec{\lambda }^1=\varvec{0}\), \(\mu _1=10\), \(\gamma =10\), and \(\tau =0.9\). The ball for the projection of Lagrange multipliers was chosen to be \(B=[100,100]^{2N}\).
We chose homogenous Dirichlet boundary conditions for the velocity on the top and bottom boundary and on \(\varvec{u}\) (see Fig. 1, right). The inflow profile on the left boundary is modeled as an inhomogenous Dirichlet boundary with \(\varvec{g}_{\varvec{\xi }}(\varvec{x}) = (\kappa (\varvec{x},\varvec{\xi }), 0)^\top \). The horizontal component is given by the truncated KarhunenLoève expansion
where \(\eta =3.5\) and \(\xi _\ell \sim U\!\left[ \frac{1}{2}, \frac{1}{2}\right] \) (U[a, b] being the uniform distribution on the interval [a, b]). We used numpy.random from numpy 1.22.4 for the generation of all random values. For this, rng=numpy.random.default_rng(seed) is used to set the generator and then the random samples are drawn by calling rng.uniform(lowerBound, upperBound, shape). The lower and upper bound correspond to the bounds of the uniform distribution. The shape of the matrix of random values was set to \((100, m_k)\) yielding \(100 \times m_k\) random values per stochastic gradient step, generated row by row. We chose the four different seeds 964113, 454612, 421507 and 107785. Parallelization of multiple realizations was performed via MPI using mpi4py version 3.1.4, which distributed the matrix to the \(m_k\) processes columnwise. On the right boundary, a homogenous Neumann boundary condition is imposed. The step size is chosen as \(t_k=\frac{20}{\mu _k}\), the scaling of which is obtained by tuning (to avoid deterioration of the mesh, especially in the first steps of the inner loop procedure). The maximum number of inner loop iterations is chosen to be \(N_k=c_1 \cdot 2^k\), with \(c_1=4\) or \(c_1=25\). The batch size is increased according to \(m_k=c_2 \cdot 2^k\), with \(c_2=\frac{1}{2}\) or \(c_2=5\). Each inner loop k requires \(m_k\cdot R_k\) solutions of the state equation, the adjoint equation, the Poisson equation for the Lamé parameter, and the deformation equation, which becomes computationally expensive for high k.
The obtained shapes for \(c_1=4\) and \(c_2=\frac{1}{2}\) for different seeds are shown in Fig. 2. For all seeds, the topright shape \(u_2\) looks basically identical to the shape used to obtain the target velocity, however \(u_1\) (left) shows differences at the bottom left and on the righthand side between different seeds and compared to the shapes for the target velocity, and \(u_3\) has a different lefthand side. Differences for \(u_3\) between the different seeds can also be observed. We investigate the optimization with the random seed 421507 further. The remesher is activated after the stochastic gradient step 4, 9, 13, 20, 26, 37, 93 and 1682. In Fig. 3, the numerical results for objective functional estimate \(\hat{j}=\frac{1}{m_k}\sum _{i=1}^{m_k} J(\varvec{z}^{k,j},\varvec{\xi }^{k,j,i})\) and the estimate of the \(H^1\) norm of the mesh deformation \(\widehat{\varvec{V}}=\frac{1}{m_k}\sum _{i=1}^{m_k} \varvec{V}_{\varvec{\xi }^{k,j,i}}\) over cumulative stochastic gradient steps is provided. Here, even for a comparatively low number of samples per step, we see a strong decrease in objective functional values initially. The points where the inner loop is stopped due to reaching \(R_k\) are denoted by the red vertical dashed lines in the righthand side plot. At the later stages of the optimization the batch size is increased up to \(m_{11}=1024\) for \(k=11\). This yields an increasingly accurate approximation of the mesh deformation and the objective functional value as evidenced by the decreasing variance.
We provide the numerical results at the end of each inner loop for different seeds, \(c_1\) and \(c_2\) in Tables 1, 2 and 3. Here, we present the number of iterations until random stopping \(R_k\), the \(H^1\) norm of the mesh deformation for each k, which is estimated using the seed 883134 and a (larger) sample size of \(m=10024\) as \(\widehat{\varvec{V}}= \frac{1}{m} \sum _{i=1}^m \varvec{V}_{{\varvec{\xi }}^i}\), and the infeasibility measure \(H_k\). Different seeds (Tables 1 and 2) behave differently regarding mesh deformation norm estimate, penalty factor and infeasibility, however the mesh deformation norm estimate and the infeasibility measure were overall reduced by orders of magnitude. We attribute the increases in these values in between to the effect of the randomness on the stochastic gradient. Using larger batch sizes (Table 3, left) yielded lower mesh deformation norms at a significantly increased computational cost, indicating a very strong influence of the randomness on the objective functional that can be reduced by larger sample sizes, cf. also Fig. 3. An increased iteration limit \(N_k\) (Table 3, right) did not seem to improve the result, which is expected due to the strong influence of the randomness on the objective functional.
As an additional numerical experiment, we investigated the influence of the choice of B for the projection of Lagrange multipliers. Instead of \(B=[100,100]^{2N}\) we chose \(B=[0.1,0.1]^{2N}\). The batch size and maximum number of inner loop iterations match those in Table 2, left. Therefore, the random samples were exactly the same in both cases, but the optimization problem changes since the Lagrange multipliers are different. We did not see any notable improvement in performance by choosing the smaller set.
4 Conclusion
In this paper, we introduced a novel method for solving constrained optimization problems under uncertainty, where the optimization variable belongs to a Riemannian (shape) manifold. The objective functional is formulated as an expectation and the constraints are deterministic. Our work is motivated by applications in PDEconstrained shape optimization, where uncertainty enters the problem in the form of a random PDE, and geometric constraints are introduced to avoid trivial solutions. The optimization variable—the shape—is understood as an element of a Riemannian shape manifold.
Using the framework of Riemannian manifolds allows us to rigorously prove the convergence of our method, which we call the stochastic augmented Lagrangian method. This algorithm consists of a batch stochastic gradient method with random stopping in an inner loop, combined with an augmented Lagrangian method in an outer loop. The inherently nonconvex character of our underlying application is the reason for introducing random stopping and it allows us to prove convergence rates in expectation even in the absence of convexity. A price that is paid for the guaranteed convergence rates is that the inner loop procedure becomes increasingly expensive. While this is a disadvantage, this still outperforms the standard approach used in sample average approximation, where a onetime sample is taken and the corresponding problem is solved using all samples. The stochastic approximation approach used here dynamically samples over the course of optimization, allowing us to use dramatically fewer samples, especially in the first iterations of the augmented Lagrangian procedure. To our knowledge, our method is the first to solve this kind of shape optimization problem under uncertainty. Since this is quite new, the results of this paper leave space for future research. In particular, there are a few open questions from differential geometry that are outside the scope of the paper but that came up while formulating our theory. It is still unclear whether Assumption 1 is satisfied for the manifold used in our application. In particular, we require connectivity and the existence of a bounded injectivity radius of the shape space under consideration.
Data Availability
Most of the data generated or analyzed during this study are included in this published article. Any additional information, including the code for simulations, is available from the corresponding author upon reasonable request.
Notes
Further information about the technical specifications can be found at https://www.hsuhh.de/hpc/en/hsuper/.
The mesh quality is measured with the FEniCS function MeshQuality.radius_ratio_min_max.
References
Absil, P., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008). https://doi.org/10.1515/9781400830244
Alnaes, M.S., Blechta, J., Hake, J., Johansson, A., Kehlet, B., Logg, A., Richardson, C., Ring, J., Rognes, M.E., Wells, G.N.: The FEniCS project version 1.5. Arch. Numer. Softw. 3, 20553 (2015). https://doi.org/10.11588/ans.2015.100.20553
Bauer, M., Harms, P., Michor, P.: Sobolev metrics on shape space of surfaces. J. Geom. Mech. 3(4), 389–438 (2011). https://doi.org/10.3934/jgm.2011.3.389
Bauer, M., Harms, P., Michor, P.: Sobolev metrics on shape space, II: weighted Sobolev metrics and almost local metrics. J. Geom. Mech. 4(4), 365–383 (2012). https://doi.org/10.3934/jgm.2012.4.365
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Cham (2017). https://doi.org/10.1007/9783319483115
Bergmann, R., Herzog, R.: Intrinsic formulation of KKT conditions and constraint qualifications on smooth manifolds. SIAM J. Optim. 29(4), 2423–2444 (2019). https://doi.org/10.1137/18m1181602
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, London (1982). https://doi.org/10.1016/C20130103662
Birgin, E.G., Martínez, J.M.: Practical Augmented Lagrangian Methods for Constrained Optimization. SIAM, Philadelphia (2014). https://doi.org/10.1137/1.9781611973365
Blauth, S., Leithäuser, C., Pinnau, R.: Shape sensitivity analysis for a microchannel cooling system. J. Math. Anal. Appl. 492(2), 124,476 (2020). https://doi.org/10.1016/j.jmaa.2020.124476
Bonnabel, S.: Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Autom. Control 58(9), 2217–2229 (2013). https://doi.org/10.1109/tac.2013.2254619
Cheney, M., Isaacson, D., Newell, J.: Electrical impedance tomography. SIAM Rev. 41(1), 85–101 (1999). https://doi.org/10.1137/s0036144598333613
Constantin, A., Kappeler, T., Kolev, B., Topalov, P.: On geodesic exponential maps of the Virasoro group. Ann. Global Anal. Geom. 31(2), 155–180 (2007). https://doi.org/10.1007/s1045500690428
Dener, A., Miller, M.A., Churchill, R.M., Munson, T.S., Chang, C.S.: Training neural networks under physical constraints using a stochastic augmented Lagrangian approach (2020). https://doi.org/10.48550/arXiv.2009.06534. ArXiv preprint
Geiersbach, C., LoayzaRomero, E., Welker, K.: Stochastic approximation for optimization in shape spaces. SIAM J. Optim. 31(1), 348–376 (2021). https://doi.org/10.1137/20M1316111
Geiersbach, C., LoayzaRomero, E., Welker, K.: PDEconstrained shape optimization: toward product shape spaces and stochastic models. In: Chen, K., Schönlieb, C.B., Tai, X.C., Younes, L. (eds.) Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging, pp. 1585–1630. Springer, Cham (2023). https://doi.org/10.1007/9783030986612_120
Geiersbach, C., Scarinci, T.: A stochastic gradient method for a class of nonlinear PDEconstrained optimal control problems under uncertainty. J. Differ. Equ. 364, 635–666 (2023). https://doi.org/10.1016/j.jde.2023.04.034
Geuzaine, C., Remacle, J.F.: Gmsh: a 3D finite element mesh generator with builtin pre and postprocessing facilities. Int. J. Numer. Methods Eng. 79(11), 1309–1331 (2009). https://doi.org/10.1002/nme.2579
Ghadimi, S., Lan, G.: Stochastic first and zerothorder methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013). https://doi.org/10.1137/120880811
Ghadimi, S., Lan, G., Zhang, H.: Minibatch stochastic approximation methods for nonconvex stochastic composite optimization. Appl. Manag. Sci. Prod. Finance Oper. 155(1–2), 267–305 (2016). https://doi.org/10.1007/s1010701408461
Gut, A.: Probability: A Graduate Course. Springer, New York (2013). https://doi.org/10.1007/9781461447085
Kanzow, C., Steck, D.: On error bounds and multiplier methods for variational problems in Banach spaces. SIAM J. Control Optim. 56(3), 1716–1738 (2018). https://doi.org/10.1137/17m1146518
Kanzow, C., Steck, D.: Improved local convergence results for augmented Lagrangian methods in \({C}^2\)cone reducible constrained optimization. Appl. Manag. Sci. Prod. Finance Oper. 177(1), 425–438 (2019). https://doi.org/10.1007/s1010701812619
Kanzow, C., Steck, D., Wachsmuth, D.: An augmented Lagrangian method for optimization problems in Banach spaces. SIAM J. Control Optim. 56(1), 272–291 (2018). https://doi.org/10.1137/16m1107103
Karl, V., Wachsmuth, D.: An augmented Lagrange method for elliptic state constrained optimal control problems. Comput. Optim. Appl. 69(3), 857–880 (2018). https://doi.org/10.1007/s105890179965y
Khuzani, M.B., Li, N.: Stochastic primaldual method on Riemannian manifolds of bounded sectional curvature. In: Chen, X., Luo, B., Luo, F., Palade, V., Wani, M.A. (eds.) 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 133–140. IEEE, Cancun (2017). https://doi.org/10.1109/ICMLA.2017.0167
Kilian, M., Mitra, N.J., Pottmann, H.: Geometric modeling in shape space. ACM Trans. Graph. 26(3), 64 (2007). https://doi.org/10.1145/1276377.1276457
Kovnatsky, A., Glashoff, K., Bronstein, M.M.: MADMM: a generic algorithm for nonsmooth optimization on manifolds. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016. Lecture Notes in Computer Science, vol. 9909, pp. 680–696. Springer, Cham (2016). https://doi.org/10.1007/9783319464541_41
Kriegl, A., Michor, P.: The convenient setting of global analysis. Mathematical Surveys and Monographs, vol. 53. American Mathematical Society (1997). https://doi.org/10.1090/surv/053
Kühnel, W.: Differentialgeometrie: Kurven, Flächen und Mannigfaltigkeiten, 4th edn. Vieweg (2008). https://doi.org/10.1007/9783834894533
Kurtek, S., Klassen, E., Ding, Z., Srivastava, A.: A novel Riemannian framework for shape analysis of 3D objects. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1625–1632. IEEE, San Francisco, CA, USA (2010). https://doi.org/10.1109/cvpr.2010.5539778
Kwon, O., Woo, E.J., Yoon, J., Seo, J.: Magnetic resonance electrical impedance tomography (MREIT): simulation study of \({J}\)substitution algorithm. IEEE Trans. Biomed. Eng. 49(2), 160–167 (2002). https://doi.org/10.1109/10.979355
Lang, S.: Fundamentals of Differential Geometry. Springer, New York (1999). https://doi.org/10.1007/9781461205418
Laurain, A., Sturm, K.: Distributed shape derivative via averaged adjoint method and applications. ESAIM Math. Model. Numer. Anal. 50(4), 1241–1267 (2016). https://doi.org/10.1051/m2an/2015075
Lee, J.: Manifolds and Differential Geometry, vol. 107. American Mathematical Society (2009). https://doi.org/10.1090/gsm/107
Liu, C., Boumal, N.: Simple algorithms for optimization on Riemannian manifolds with constraints. Appl. Math. Optim. 82(3), 949–981 (2020). https://doi.org/10.1007/s00245019095643
Michor, P.W., Mumford, D.: Vanishing geodesic distance on spaces of submanifolds and diffeomorphisms. Doc. Math. 10, 217–245 (2005). https://doi.org/10.4171/dm/187
Michor, P.W., Mumford, D.: Riemannian geometries on spaces of plane curves. J. Eur. Math. Soc. 8(1), 1–48 (2006). https://doi.org/10.4171/JEMS/37
Michor, P.W., Mumford, D.: An overview of the Riemannian metrics on spaces of curves using the Hamiltonian approach. Appl. Comput. Harmon. Anal. 23(1), 74–113 (2007). https://doi.org/10.1016/j.acha.2006.07.004
Mio, W., Srivastava, A., Joshi, S.: On shape of plane elastic curves. Int. J. Comput. Vis. 73(3), 307–324 (2007). https://doi.org/10.1007/s1126300699680
Pryymak, L., Suchan, T., Welker, K.: A product shape manifold approach for optimizing piecewisesmooth shapes. In: Nielsen, F., Barbaresco, F. (eds.) Geometric Science of Information, GSI 2023. Lecture Notes in Computer Science, vol. 14071, pp. 21–30. Springer, Cham (2023). https://doi.org/10.1007/9783031382710_3
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951). https://doi.org/10.1007/9781461251101_9
Sato, H., Kasai, H., Mishra, B.: Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport. SIAM J. Optim. 29(2), 1444–1472 (2019). https://doi.org/10.1137/17m1116787
Schulz, V.H., Siebenborn, M., Welker, K.: Efficient PDE constrained shape optimization based on Steklov–Poincarétype metrics. SIAM J. Optim. 26(4), 2800–2819 (2016). https://doi.org/10.1137/15m1029369
Schulz, V.H., Welker, K.: On optimization transfer operators in shape spaces. In: Schulz, V.H., Seck, D. (eds.) International Series of Numerical Mathematics, pp. 259–275. Springer, Cham (2018). https://doi.org/10.1007/9783319904696_13
Siebenborn, M., Vogel, A.: A shape optimization algorithm for cellular composites. PINT Computing and Visualization in Science (2021). https://doi.org/10.51375/IJCVSE.2021.1.5
Siebenborn, M., Welker, K.: Algorithmic aspects of multigrid methods for optimization in shape spaces. SIAM J. Sci. Comput. 39(6), B1156–B1177 (2017). https://doi.org/10.1137/16m1104561
Steck, D.: Lagrange multiplier methods for constrained optimization and variational problems in Banach spaces. Ph.D. thesis, Universität Würzburg (2018). https://opus.bibliothek.uniwuerzburg.de/frontdoor/index/index/year/2018/docId/17444
Welker, K.: Efficient PDE constrained shape optimization in shape spaces. PhD thesis, Universität Trier (2016). https://doi.org/10.25353/ubtrxxxx6575788c/
Yamakawa, Y., Sato, H.: Sequential optimality conditions for nonlinear optimization on Riemannian manifolds and a globally convergent augmented Lagrangian method. Comput. Optim. Appl. 81(2), 397–421 (2022). https://doi.org/10.1007/s1058902100336w
Yang, W.H., Zhang, L.H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pac. J. Optim. 10(2), 415–434 (2014)
Younes, L., Michor, P., Shah, J., Mumford, D.: A metric on shape space with explicit geodesics. Rendiconti Lincei  Matematica e Applicazioni, pp. 25–57 (2008). https://doi.org/10.4171/rlm/506
Zhang, H., Reddi, S.J., Sra, S.: Riemannian SVRG: Fast stochastic optimization on Riemannian manifolds. In: Lee, D.D., von Luxburg, U., Garnett, R., Sugiyama, M., Guyon, I. (eds.) Proceedings of the 30th International Conference on Neural Information Processing Systems, vol. 29, pp. 4599–4607. Curran Associates Inc., Red Hook (2016). https://doi.org/10.5555/3157382.3157611
Zhang, H., Sra, S.: Firstorder methods for geodesically convex optimization. In: Feldman, V., Rakhlin, A., Shamir, O. (eds.) 29th Annual Conference on Learning Theory, PMLR, vol. 49, pp. 1617–1638. Columbia University, New York (2016)
Acknowledgements
This work has been partly supported by the German Research Foundation (DFG) within the priority program SPP 1962 under contract number WE 6629/11 and by the state of Hamburg (Germany) within the Landesforschungsförderung under project “SimulationBased Design Optimization of Dynamic Systems Under Uncertainties” (SENSUS) with project number LFFGK11. Computational resources (HPC cluster HSUper) have been provided by the project hpc.bw, funded by dtec.bw – Digitalization and Technology Research Center of the Bundeswehr. dtec.bw is funded by the European Union – NextGenerationEU.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Xiaoqi Yang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Geiersbach, C., Suchan, T. & Welker, K. Stochastic Augmented Lagrangian Method in Riemannian Shape Manifolds. J Optim Theory Appl (2024). https://doi.org/10.1007/s10957024024881
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10957024024881
Keywords
 Augmented Lagrangian
 Stochastic optimization
 Uncertainties
 Inequality constraints
 Riemannian manifold
 Shape optimization
 Geometric constraints