1 Introduction

In this work, we present a multigrid method to solve the saddle point system

$$\begin{aligned} \mathcal {S}\textbf{x}=\textbf{f}, \end{aligned}$$
(1)

where \(\textbf{x}=(\textbf{y},\textbf{u},\textbf{p})=(\textbf{y}_1,\dots ,\textbf{y}_N,\textbf{u},\textbf{p}_1,\dots ,\textbf{p}_N)^\top \), \(\mathcal {S}\) has the block structure

$$\begin{aligned} \mathcal {S}=\begin{pmatrix} C_1 &{} &{} &{} &{} A_1^\top \\ &{} \ddots &{} &{} &{} &{}\ddots \\ &{} &{} C_N &{} &{} &{} &{} A_N^\top \\ &{} &{} &{} G &{} D_1 &{}\dots &{} D_N\\ A_1 &{} &{} &{} E_1\\ &{} \ddots &{} &{}\vdots \\ &{} &{} A_N &{} E_N \end{pmatrix}, \end{aligned}$$
(2)

and all submatrices involved represent the discretization of some differential operators. More details on each block are provided in Sect. 2. Matrices such as (2) are often encountered while solving PDE-constrained optimization problems under uncertainty of the form

$$\begin{aligned} \begin{aligned}&\min _{u\in U} \mathcal {R}\left[ Q(y(\omega ),u)\right] \\&\text {s.t. } y(\omega )\in V\text { satisfies}\\&\langle e(y(\omega ),u,\omega ),v\rangle =0 \quad \forall v\in V,\text { a.e. }\omega \in \Omega , \end{aligned} \end{aligned}$$
(3)

where u is the unknown deterministic control, \(y(\omega )\) is the state variable which satisfies a random PDE constraint expressed by \(e(\cdot ,\cdot ,\omega )\) for almost every realization \(\omega \) of the randomness, Q is a real-valued quantity of interest (cost functional) and \(\mathcal {R}\) is a risk measure. The vectors \(\left\{ \textbf{y}_j\right\} _{j=1}^N\) and \(\left\{ \textbf{p}_j\right\} _{j=1}^N\) are the discretizations of the state and adjoint variables \(y(\omega )\) and \(p(\omega )\) at the N samples in which the random PDE constraint is collocated. The vector \(\textbf{u}\) is the discretization of the deterministic control u. Problems of the form (3) are increasingly employed in applications. The PDE constraints typically represent some underlying physical model whose behaviour should be optimally controlled, and the randomness in the PDE allows one to take into account the intrinsic variability or lack of knowledge on some parameters entering the model. The introduction of a risk measure in (3) allows one to construct robust controls that take into account the distribution of the cost over all possible realizations of the random parameters. Therefore, the topic has received a lot of attention in the last years, see, e.g. [1,2,3,4,5,6,7,8,9].

However, few works have focused on efficient solvers for the optimality systems (1). A popular approach is to perform a Schur complement on \(\textbf{u}\) and solve the reduced system with a Krylov method (possibly with Conjugate Gradient), despite each iteration would then require the solution of 2N PDEs, with \(A_j\) and \(A_j^\top \) for \(j=1,\dots ,N\) [10]. For a full-space formulation, block diagonal preconditioners have been proposed in [11] and analyzed in [12], using both an algebraic approach based on Schur complement approximations and an operator preconditioning framework.

In this manuscript, we design a multigrid method to solve general problems of the form (1), present a detailed convergence analysis which, although in a simplified setting, is nontrivial and requires technical arguments, and show how this strategy can be used for the efficient solution of three different Optimal Control Problems Under Uncertainty (OCPUU). First, we consider a linear-quadratic OCPUU and use the multigrid algorithm directly to solve the linear optimality system. Second, we consider a nonsmooth OCPUU with box constraints and \(L^1\) regularization on the control. To solve such problem, we use the collective multigrid method as an inner solver within an outer semismooth Newton iteration. Incidentally, we show that the theory developed for the deterministic OCPs with \(L^1\) regularization can be naturally extended to the class of OCPUU considered here. Third, we study a risk-averse OCPUU involving the smoothed Conditional Value at Risk (CVaR) and test the performance of the multigrid scheme in the context of a nonlinear preconditioned Newton method.

The multigrid algorithm is based on a collective smoother [13,14,15] that, at each iteration, loops over all nodes of the computational mesh (possibly in parallel), collects all the degrees of freedom related to a node, and updates them collectively by solving a reduced saddle-point problem. For classical (deterministic) PDE-constrained optimization problems with a distributed control, this reduced system has size \(3\times 3\), thus its solution is immediate [14]. In our context, the reduced problem has size \((2N+1)\times (2N+1)\), which can be large when dealing with a large number of samples. Fortunately, we show that it can be solved with optimal O(N) complexity.

From the theoretical point of view, there are very few convergence analyses of collective smoothers even in the deterministic setting, namely [14] based on a local Fourier analysis, and [15] which relies on an algebraic approach. Notably, the presence of a low-rank block matrix in the reduced optimality system (obtained by eliminating the control) as well as the need to have stiffness and mass matrices with specific structure make it difficult to extend the analysis of [15]. We therefore present in this manuscript a fully new convergence analysis of collective smoothers and two-level collective multigrid methods in a simplified setting, which also covers the deterministic setting as a particular instance.

Let us remark that collective multigrid strategies have been applied to OCPUU in [16, 17] and in [18]. This manuscript differs from the mentioned works since, on the one hand, [16, 17] considers a stochastic control u, therefore for (almost) every realization of the random parameters a different control \(u(\omega )\) is computed through the solution of a standard deterministic OCP. On the other hand, [18] considers a stochastic Galerkin discretization, and hence the corresponding optimality system has a structure which is very different from (2).

The multigrid algorithm presented here assumes that all state and adjoint variables are discretized on the same finite element mesh. The control can instead live on a subregion of the computational mesh, so that the algorithm is applicable also to optimization problems with local or boundary controls.

Finally, we remark that the multigrid solver proposed is based on a hierarchy of spatial discretizations corresponding to different levels of approximation, but the discretization of the probability space remains fixed, that is, the number of samples remains constant across the multigrid hierarchy. The extension of the multigrid algorithm to coarsening procedures also in the probability space will be the subject of future endeavours. We hint at possible approaches and challenges in Sect. 3 (see Remark 1). Nevertheless, we stress that the multigrid algorithm can already be incorporated within outer optimization routines that take advantange of different levels of approximations of the probability space, see, e.g., [7, 10, 19].

The rest of the manuscript is organized as follows. In Sect. 2 we introduce the notation, a classical linear-quadratic OCPUU, and interpret (2) as the matrix associated to the optimality system of a discretized OCPUU. Section 3 presents the collective multigrid algorithm, discusses implementation details and develops the convergence analysis. Further, the algorthm is numerically tested on the linear-quadratic OCPUU. In Sect. 4, we consider a nonsmooth OCPUU with box constraints and a \(L^1\) regularization on the control. Section 5 deals with a risk-averse OCPUU. For each of these cases, we first show how the multigrid approach can be integrated into the solution process, by detailing concrete algorithms, and then we present extensive numerical experiments to show the efficiency of the proposed framework. Finally, we draw our conclusions in Sect. 6.

2 A Linear-Quadratic Optimal Control Problem Under Uncertainty

Let \(\mathcal {D}\subset \mathbb {R}^d\) be a Lipschitz bounded domain, \(V\subset L^2(\mathcal {D})\) a Sobolev space (e.g. \(H^1(\mathcal {D})\) equipped with suitable boundary conditions), and \((\Omega ,\mathcal {F},\mathbb {P})\) a complete probability space. Given a function u belonging to a Hilbert space U, we consider the linear elliptic random PDE

$$\begin{aligned} a_\omega (y,v)=\langle \mathcal {B}u,v\rangle ,\forall v\in V,\quad \mathbb {P}\text {-a-e. } \omega \in \Omega , \end{aligned}$$
(4)

where \(a_{\omega }(\cdot ,\cdot ):V\times V\rightarrow \mathbb {R}\) is a bilinear form and \(\langle \cdot ,\cdot \rangle \) denotes the duality between V and \(V^\prime \). \(\mathcal {B}:U\rightarrow V^\prime \) is a continuous control operator allowing possibly for a local control (i.e. a control acting only on a subset \(\mathcal {D}_0\subset \mathcal {D})\) or a boundary control (i.e. a control acting as Neumann condition on a subset of \(\partial \mathcal {D}\)). To assure uniqueness and sufficient integrability of the solution of (4), we make the following additional assumption.

Assumption 1

There exist two random variables \(a_{\min }(\omega )\) and \(a_{\max }(\omega )\) such that

$$\begin{aligned} 0<a_{\min }(\omega )\Vert v\Vert ^2_V\le a_\omega (v,v)\le a_{\max }(\omega )\Vert v\Vert ^2_V,\quad \forall v\in V,\ \mathbb {P}\text {-a.e. }\omega \in \Omega , \end{aligned}$$

and further \(a^{-1}_{\min }\) and \(a_{\max }\) are in \(L^p(\Omega )\) for some \(p\ge 4\).

Under Assumption 1, it is well-known (see, e.g., [20, 21]) that (4) admits a solution in V for \(\mathbb {P}\text {-a.e. } \omega \), and the solution y, interpreted as a V-valued random variable \(y:\omega \in \Omega \mapsto y(\omega )\in V\), lies in the Bochner space \(L^q(\Omega ;V)\), \(q\le p\), [22]. We often use the shorthand notation \(y_\omega =y(\cdot ,\omega )\) when the dependence on x is not needed, or \(y_{\omega }(u)\) if we wish to highlight the dependence on the control function u.

In this manuscript, we consider the minimization of functionals constrained by (4). Let us first focus on the linear-quadratic problem

$$\begin{aligned} \begin{aligned}&\min _{u\in U,y\in L^2(\Omega ;V)} \frac{1}{2}\mathbb {E}\left[ \Vert \mathcal {I}y_\omega -y_d\Vert ^2_{L^2(\mathcal {D})}\right] +\frac{\nu }{2}\Vert u\Vert ^2_{U},\\&\quad \text {subject to}\\&a_\omega (y_\omega ,v)=\langle \mathcal {B}u+f,v\rangle ,\quad \forall v \in V,\ \mathbb {P}\text {-a.e. } \omega \in \Omega , \end{aligned} \end{aligned}$$
(5)

where \(y_d\in L^2(\mathcal {D})\) is a target state, \(f\in V^\prime \), \(\mathbb {E}:L^1(\Omega )\rightarrow \mathbb {R}\) is the expectation operator, \(\nu >0\), and \(\mathcal {I}\) is the embedding operator from V to \(L^2(\mathcal {D})\). Introducing the linear control-to-state map \(S: g\in V^\prime \rightarrow y_\omega (g)\in L^2(\Omega ;V)\), the reduced formulation of (5) is

$$\begin{aligned} \min _{u\in U} \frac{1}{2}\mathbb {E}\left[ \Vert \mathcal {I}S(\mathcal {B} u+f)-y_d\Vert ^2_{L^2(\mathcal {D})}\right] +\frac{\nu }{2}\Vert u\Vert ^2_{U}. \end{aligned}$$
(6)

Existence and uniqueness of the minimizer of (6) follows directly from standard variational arguments [1, 23,24,25]. Furthermore, due to Assumption 1, the optimal control \(\overline{u}\) satisfies the variational equality

$$\begin{aligned} (\nu \overline{u} -\Lambda _U \mathcal {B}^\star S^\star \mathcal {I}^\star (y_d-S(\mathcal {B}\overline{u}+f)),v)_{U}=0,\quad \forall v \in U, \end{aligned}$$
(7)

where \(\Lambda _U\) is the Riesz operator of U. The adjoint operator \(S^\star : L^2(\Omega ;V^\prime )\rightarrow V\) is characterized by \(S^\star z=\mathbb {E}\left[ p\right] \) where \(p=p_{\omega }(x)\) is the solution of the adjoint equation

$$\begin{aligned} a_\omega (v,p_\omega )=\langle z(\omega ),v\rangle ,\quad \forall v \in V,\ \mathbb {P}\text {-a-e. } \omega \in \Omega . \end{aligned}$$
(8)

The optimality condition (7) can thus be formulated as the optimality system

$$\begin{aligned} \begin{aligned}&a_\omega (y_\omega ,v)=\langle \mathcal {B}\overline{u}+f,v\rangle ,\quad \forall v\in V,\quad \mathbb {P}\text {-a-e. } \omega \in \Omega ,\\&a_\omega (v,p_\omega )=\langle \mathcal {I}^\star (y_d-y_\omega ),v\rangle ,\quad \forall v \in V,\ \mathbb {P}\text {-a-e. } \omega \in \Omega ,\\&(\nu \overline{u}- \Lambda _U \mathcal {B}^\star \mathbb {E}\left[ p_\omega \right] ,v)_{U}=0,\quad \forall v \in U. \end{aligned} \end{aligned}$$
(9)

To solve numerically (5), we replace the exact expectation operator \(\mathbb {E}\) of the objective functional by a quadrature formula \({\widehat{{\mathbb {E}}}}\) with N nodes \(\left\{ \omega _i\right\} _{i=1}^N\) and positive weights \(\left\{ \zeta _i\right\} _{i=1}^N\), namely

$$\begin{aligned} \mathbb {E}\left[ X\right] \approx {\widehat{{\mathbb {E}}}}\left[ X\right] := \sum _{i=1}^N \zeta _i X(\omega _i),\quad \text {with}\quad \sum _{i=1}^N \zeta _i=1. \end{aligned}$$

Common quadrature formulae are Monte Carlo, Quasi-Monte Carlo and Gaussian formulae. The latter requires that the probability space can be parametrized by a (finite or countable) sequence of random variables \(\left\{ \chi _j\right\} _j\), each with distribution \(\mu _j\), and the existence of a complete basis of tensorized \(L^2_{\mu _j}\)-orthonormal polynomials. Hence for the semi-discrete OCP, the \(\mathbb {P}\)-a.e. PDE-constraint is naturally collocated onto the nodes of the quadrature formula.

Concerning the space domain, we consider a family of regular triangulations \(\left\{ \mathcal {T}_h\right\} _{h>0}\) of \(\mathcal {D}\), and a Galerkin projection onto a conforming finite element space \(V^h\subset V\) of continuous piecewise polynomial functions of degree r over \(\mathcal {T}_h\). \(N_h\) is the dimension of \(V^h\) and \(\left\{ \phi _i\right\} _{i=1}^{N_h}\) is a nodal Lagrangian basis. We discretize the state and adjoint variables on the same finite element space. The control variable is discretized on the finite element space \(U_h=\text {span}\left\{ \psi _i\right\} _{i=1}^{N_u}\), where \(N_u\) is possibly strictly smaller than \(N_h\) in case of a local or a boundary control.

Once fully discretized, (9) can be expressed as

$$\begin{aligned} \begin{pmatrix} M &{} &{} &{} &{} A_1^\top \\ &{} \ddots &{} &{} &{} &{}\ddots \\ &{} &{} M &{} &{} &{} &{} A_N^\top \\ &{} &{} &{} \nu M_U &{} -\zeta _1 B^\top &{}\dots &{} -\zeta _N B^\top \\ A_1 &{} &{} &{} -B\\ &{} \ddots &{} &{}\vdots \\ &{} &{} A_N &{} -B \end{pmatrix} \begin{pmatrix} \textbf{y}_1\\ \vdots \\ \textbf{y}_N\\ \textbf{u}\\ \textbf{p}_1\\ \vdots \\ \textbf{p}_N \end{pmatrix}= \begin{pmatrix} M \textbf{y}_d\\ \vdots \\ M \textbf{y}_d\\ \textbf{0} \\ M\textbf{f}\\ \vdots \\ M\textbf{f} \end{pmatrix}, \end{aligned}$$
(10)

where \(A_j\) are the stiffness matrices associated to the bilinear forms \(a_{\omega _j}(\cdot ,\cdot )\), M and \(M_U\) are mass matrices corresponding to the finite element spaces \(V_h\) and \(U_h\), B is the discretization of the control operator, \(\textbf{y}_d\) and \(\textbf{f}\) are the finite element discretizations of \(y_d\) and f respectively, while \(\textbf{y}_j\) and \(\textbf{p}_j\) are the discretizations of \(y_{\omega _j}\) and \(p_{\omega _j}\). Notice that the matrix in (10) could be symmetrized by multipling the first and the last N rows by the weights \(\left\{ \zeta _i\right\} _{i=1}^N\). This would be also consistent with the theoretical interpretation of the blocks of the saddle point system as discretizations of continuous inner products. From the numerical point of view, we have not observed relevant advantanges in maintaining the weights. Since for more general problems (see, e.g., Sec. 4) the symmetry of the saddle point system cannot be recovered by multiplying some equations by the quadrature weights, we do not consider the symmetrized version in this work.

3 Collective Multigrid Scheme

In this section, we describe the multigrid algorithm to solve the full space optimality system (10). First, we consider a distributed control, so that u lives on the whole computational mesh and \(B=M\). Local and boundary controls are discussed at the end of the section. Second, for the sake of generality, we consider the more general matrix (2), so that our discussion covers also the different saddle-point matrices obtained in Sects. 4 and 5.

For each node of the triangulation, let us introduce the vectors \({\widetilde{{\textbf{y}}}}_i\) and \({\widetilde{{\textbf{p}}}}_i\),

$$\begin{aligned} {\widetilde{{\textbf{y}}}}_i=\begin{pmatrix} (\textbf{y}_1)_i\\ \vdots \\ (\textbf{y}_N)_i \end{pmatrix}\in \mathbb {R}^{N}, \quad {\widetilde{{\textbf{p}}}}_i=\begin{pmatrix} (\textbf{p}_1)_i\\ \vdots \\ (\textbf{p}_N)_i \end{pmatrix} \in \mathbb {R}^{N}, \quad i=1,\dots ,N_h, \end{aligned}$$

which collect the degrees of freedom associated to the i-th node, the scalar \(u_i=(\textbf{u})_i\), and the restriction operators \(R_i \in \mathbb {R}^{(2N+1) \times ((2N+1) N_h)}\) such that

$$\begin{aligned} R_i \begin{pmatrix} \textbf{y}\\ \textbf{u}\\ \textbf{p}\end{pmatrix}=\begin{pmatrix} {\widetilde{{\textbf{y}}}}_i\\ u_i\\ {\widetilde{{\textbf{p}}}}_i \end{pmatrix}=:\textbf{x}_i. \end{aligned}$$
(11)

The prolongation operators are \(P_i:=R_i^\top \), while the reduced matrices \({\widetilde{S}}_i:=R_iSP_i\in \mathbb {R}^{(2N+1)\times (2N+1)}\) represent a condensed saddle-point matrix on the i-th node, and satisfy

$$\begin{aligned} {\widetilde{S}}_i=\begin{pmatrix} \text {diag}(\textbf{c}_i) &{} 0 &{}\text {diag}(\textbf{a}_i)\\ 0 &{} (G)_{i,i} &{} \textbf{d}_i^\top \\ \text {diag}(\textbf{a}_i) &{} \textbf{e}_i &{} 0 \end{pmatrix} \end{aligned}$$

with \(\textbf{c}_i:=((C_1)_{i,i},\dots ,(C_N)_{i,i})^\top \), \(\textbf{a}_i:=((A_1)_{i,i},\dots ,(A_N)_{i,i})^\top \), \(\textbf{e}_i=((E_1)_{i,i},\dots ,(E_N)_{i,i})^\top \), \(\textbf{d}_i=((D_1)_{i,i},\dots ,(D_N)_{i,i})^\top \), where \(\text {diag}(\textbf{v})\) denotes a diagonal matrix with the components of \(\textbf{v}\) on the main diagonal.

Given an initial vector \(\textbf{x}^0\), a Jacobi-type collective smoothing iteration computes for \(n=1,\dots ,n_1\),

$$\begin{aligned} \textbf{x}^n=\textbf{x}^{n-1}+ \theta \sum _{i=1}^{N_h} P_i {\widetilde{S}}_i^{-1}R_i\left( \textbf{f}-S\textbf{x}^{n-1}\right) , \end{aligned}$$
(12)

where \(\theta \in (0,1]\) is a damping parameter. Gauss-Seidel variants can straightforwardly be defined. Next, we consider a sequence of meshes \(\left\{ \mathcal {T}_{h_\ell }\right\} _{\ell =\ell _{\min }}^{\ell _{\max }}\), which we assume for simplicity to be nested, and restriction and prolongator operators \(R_{\ell -1}^\ell \), \(P_{\ell -1}^{\ell }\) which map between grids \(\mathcal {T}_{h_{\ell -1}}\) and \(\mathcal {T}_{h_{\ell }}\). In the numerical experiments, the coarse matrices are defined recursively in a Galerkin fashion starting from the finest one, namely \(S_\ell :=R^{\ell +1}_{\ell }S_{\ell +1}P^{\ell +1}_{\ell }\) for \(\ell \in \left\{ 1,\dots , \ell _{\max }-1\right\} \). Nevertheless it is obviously possible to define \(S_\ell \) as the discretization of the continuous saddle-point system onto the mesh \(\mathcal {T}_{h_\ell }\). With this notation, the V-cycle collective multigrid is described by Algorithm 1, which can be repeated until a certain stopping criterion is satisfied. We used the notation Collective_Smoothing\((\cdot ,\cdot ,\cdot )\) to denote possible variants of (12) (e.g. Gauss-Seidel).

Algorithm 1
figure a

V-cycle Collective Multigrid Algorithm - V-cycle(\(\textbf{x}^{0}\),\(\textbf{f}\),\(\ell \))

Notice that (12) requires to invert the matrices \(S_i\) for each computational node. We now show that this can be done with optimal O(N) complexity. Indeed, performing a Schur complement on \(u_i\), the system \({\widetilde{S}}_i\textbf{x}_i=\textbf{f}_i\), with \(\textbf{f}_i=(\textbf{f}_{p_i},b_{u_i},\textbf{f}_{y_i})^\top \) can be solved exclusively computing inverses of diagonal matrices and scalar products between vectors through

$$\begin{aligned} \begin{aligned} u_i&=\frac{b_{u_i}+\textbf{d}_i^\top (\text {diag}(\textbf{a}_i)^{-1}\text {diag}(\textbf{c}_i)\text {diag}(\textbf{a}_i)^{-1}\textbf{f}_{y_i}-\text {diag}(\textbf{a}_i)^{-1}\textbf{f}_{p_i})}{(G)_{i,i}+\textbf{d}_i^\top \text {diag}(\textbf{a}_i)^{-1}\text {diag}(\textbf{c}_i)\text {diag}(\textbf{a}_i)^{-1}\textbf{e}_i},\\ {\widetilde{{\textbf{y}}}}_i&=(\text {diag}(\textbf{a}_i))^{-1}(\textbf{f}_{y_i}-\textbf{e}_i u_{i}),\\ {\widetilde{{\textbf{p}}}}_i&=(\text {diag}(\textbf{a}_i))^{-1}(\textbf{f}_{p_i}-\text {diag}(\textbf{c}_i) {\widetilde{{\textbf{y}}}}_i). \end{aligned} \end{aligned}$$
(13)

Notice that we should guaranteee that \(\text {diag}(\textbf{a}_i)\) admits an inverse and that \((G)_{i,i}+\textbf{d}_i^\top \text {diag}(\textbf{a}_i)^{-1}\text {diag}(\textbf{c}_i)\text {diag}(\textbf{a}_i)^{-1}\textbf{e}_i\ne 0\). This has to be verified case by case, so we now focus on the specific matrix (10). On the one hand, the vectors \(\textbf{a}_i\) are strictly positive componentwise, since \((\textbf{a}_i)_j=a_{\omega _j}(\phi _i,\phi _i)>0\) \(\forall i=1,\dots ,N_h\), \(j=1,\dots ,N\) (due to Assumption 1). On the other hand, \((G)_{i,i}=\int _\mathcal {D}\psi ^2_i(x)\ dx >0\), while a direct calculation shows that

$$\begin{aligned} \textbf{d}_i^\top \text {diag}(\textbf{a}_i)^{-1}\text {diag}(\textbf{c}_i)\text {diag}(\textbf{a}_i)^{-1}\textbf{e}_i=(M)^3_{i,i} \sum _{j=1}^N \zeta _j (A_j)^{-2}_{i,i}>0, \end{aligned}$$

which implies that the denominator in the first equation of (13) is strictly positive.

The collective smoother can be easily adjusted to accomodate local or boundary controls as discussed in [26] for deterministic OCPs. For all nodes i for which a control basis function is present, the smoothing procedure remains that of (13). For all others computational nodes for which there is not a control basis function associated, the smoothing procedure becomes

$$\begin{aligned} \begin{aligned} {\widetilde{{\textbf{y}}}}_i&=(\text {diag}(\textbf{a}_i))^{-1}\textbf{f}_{yi},\\ {\widetilde{{\textbf{p}}}}_i&=(\text {diag}(\textbf{a}_i))^{-1}(\textbf{f}_{pi}-\text {diag}(\textbf{c}_i) {\widetilde{{\textbf{y}}}}_i), \end{aligned} \end{aligned}$$

which is consistently obtained from (13) setting \(u_i=0\).

To conclude this section, we remark that the computational complexity of the smoothing procedure is of order \(O(N_h N)\), thus linear with respect to the size of the saddle point-system. Provided that the V-cycle algorithm requires a constant number of iterations to converge as the number of levels increases, and that N is not too large (so that the cost of the coarse solver is not dominant), the complexity of the multigrid algorithm can also be considered linear. In the next numerical experiments sections (Sects. 3.2, 4.1, 5.2), we show indeed that the number of iterations remains constant for several test cases.

Remark 1

(Extension to a hierarchy of samples) The multigrid algorithm presented is based on a hierarchy of spatial discretizations. However, the sample to discretize the probability space remains fixed among the levels. If one relies on the stochastic collocation method to discretize the probability space, it is possible to envisage a multigrid algorithm that also involves a coarsening of the sample size, since for each sample set one could consider the associated stable interpolator which can then be evaluated onto a coarser or finer set of samples. Nevertheless, it is not clear at the moment the interplay between the smoothing and coarsening procedures, which is key for the efficient behaviour of a multigrid scheme. Future endeavours will investigate this interesting direction. For the rest of the manuscript we restrict oursevels to a hierarchy of spatial discretizations since on the one hand, the multigrid algorithm can already be embedded in other outer optimization algorithms that involve a hierarchy of samples [7, 10, 19, 27]. On the other hand, the reduced system can be solved with optimal O(N) linear complexity, so that a coarsening in the number of samples may be superfluous.

3.1 Convergence Analysis

In this subsection, we present a convergence analysis of the collective multigrid algorithm in a simplified setting. Let \(\mathcal {D}=(0,1)\), and consider the random PDE

$$\begin{aligned} \eta (\omega )\int _0^1 \partial _x y(x,\omega ) \partial _x v(x)\;dx = \int _0^1 (f(x)+u(x))v(x)\; dx,\forall v\in V,\; \mathbb {P}\text {-a.e.}\; \omega \in \Omega , \end{aligned}$$
(14)

where \(\eta :\Omega \rightarrow \mathbb {R}^+\) is a positive valued random variable such that \(\mathbb {E}\left[ \eta ^{-2}\right] <\infty \). Our goal is to minimize the objective functional of (5) constrained by (14). A discretization using finite differences and with N Monte Carlo samples leads to the optimality system

$$\begin{aligned} \begin{pmatrix} \frac{{\widetilde{I}}}{N} &{} &{} &{} &{} \frac{\eta _1(\omega )}{N} A\\ &{} \ddots &{} &{} &{} &{}\ddots \\ &{} &{} \frac{{\widetilde{I}}}{N} &{} &{} &{} &{} \frac{\eta _N(\omega )}{N} A\\ &{} &{} &{} \nu {\widetilde{I}} &{} -\frac{{\widetilde{I}}}{N} &{}\dots &{} -\frac{{\widetilde{I}}}{N}\\ \frac{\eta _1(\omega )}{N} A &{} &{} &{} -\frac{{\widetilde{I}}}{N}\\ &{} \ddots &{} &{}\vdots \\ &{} &{} \frac{\eta _N(\omega )}{N} A\ {} &{} -\frac{{\widetilde{I}}}{N} \end{pmatrix} \begin{pmatrix} \textbf{y}_1\\ \vdots \\ \textbf{y}_N\\ \textbf{u}\\ \textbf{p}_1\\ \vdots \\ \textbf{p}_N \end{pmatrix}= \begin{pmatrix} \frac{\textbf{y}_d}{N}\\ \vdots \\ \frac{\textbf{y}_d}{N}\\ \textbf{0} \\ \frac{\textbf{f}}{N}\\ \vdots \\ \frac{\textbf{f}}{N} \end{pmatrix}, \end{aligned}$$
(15)

where A is the tridiagonal matrix associated with the 1D Laplacian, with \(2/h^2\) on the main diagonal, and \(-1/h^2\) on the two adjacent diagonals, h being the mesh size, \({\widetilde{I}}\in \mathbb {R}^{N_h\times N_h}\) is the identity matrix, and, compared to (10), the first and last blocks of N equations are divided by \(\frac{1}{N}\) to get a symmetric system. Despite the simplifying assumptions on the spatial discretization and on the random coefficient, the setting considered is illustrative as system (15) preserves the main features of (10), namely the specific block structure and the presence of random stiffness matrices.

To perform our analysis, we first eliminate the variable \(\textbf{u}\), and obtain the reduced matrix

$$\begin{aligned} \begin{pmatrix} \frac{{\widetilde{I}}}{N} &{} &{} &{} &{} \frac{\eta _1(\omega )}{N} A\\ &{} \ddots &{} &{} &{} &{}\ddots \\ &{} &{} \frac{{\widetilde{I}}}{N} &{} &{} &{} &{} \frac{\eta _N(\omega )}{N} A\\ \frac{\eta _1(\omega )}{N} A &{} &{} &{} -\frac{{\widetilde{I}}}{\nu N^2} &{}\cdots &{}\cdots &{} -\frac{{\widetilde{I}}}{\nu N^2}\\ &{} \ddots &{} &{}\vdots &{} \vdots &{} \vdots &{} \vdots \\ &{} &{} \frac{\eta _N(\omega )}{N} A\ {} &{} -\frac{{\widetilde{I}}}{\nu N^2} &{}\cdots &{}\cdots &{} -\frac{{\widetilde{I}}}{\nu N^2} \end{pmatrix} \begin{pmatrix} \textbf{y}_1\\ \vdots \\ \textbf{y}_N\\ \textbf{p}_1\\ \vdots \\ \textbf{p}_N \end{pmatrix}= \begin{pmatrix} \frac{\textbf{y}_d}{N}\\ \vdots \\ \frac{\textbf{y}_d}{N} \\ \frac{\textbf{f}}{N}\\ \vdots \\ \frac{\textbf{f}}{N} \end{pmatrix}. \end{aligned}$$
(16)

Next, let \(\textbf{z}=(\textbf{z}_1,\dots ,\textbf{z}_{N_h})^\top \in \mathbb {R}^{(2N N_h)\times 1}\), where \(\textbf{z}_j=((\textbf{y}_1)_j,\dots ,(\textbf{y}_N)_j,(\textbf{p}_1)_j,\dots , (\textbf{p}_N)_j))^\top \in \mathbb {R}^{2N\times 1}\). Notice that \(\textbf{z}_j\) corresponds to the application of \(R_i\) to \(\textbf{x}\) (see (11)), except for \(u_i\) which has been previously eliminated. By reordering the unknowns as in \(\textbf{z}\), (16) can be written as \(S\textbf{z}=\widetilde{\textbf{b}}\) for a suitable \(\widetilde{\textbf{b}}\) and

$$\begin{aligned}S=\begin{pmatrix} {\widetilde{B}} &{} B &{} \\ B &{} {\widetilde{B}} &{} B &{}\\ &{} B &{} {\widetilde{B}} &{} B &{}\\ &{} &{} \ddots &{} \ddots &{} \ddots &{}\\ &{} &{} &{} B &{} {\widetilde{B}} &{} B &{}\\ &{} &{} &{} &{} B &{} {\widetilde{B}}\\ \end{pmatrix}= {\widetilde{I}}\otimes {\widetilde{B}} + H\otimes B,\end{aligned}$$
$$\begin{aligned}{\widetilde{B}}:=\begin{pmatrix} \frac{I}{N} &{} D\\ D &{} -\frac{\textbf{1}\textbf{1}^\top }{\nu N^2} \end{pmatrix},\quad B:=\begin{pmatrix} 0 &{} -\frac{D}{2}\\ -\frac{D}{2} &{} 0 \end{pmatrix},\quad H=\begin{pmatrix} 0 &{} 1\\ 1 &{} 0 &{} 1\\ &{} \ddots &{}\ddots &{}\ddots \\ &{} &{} 1 &{} 0 &{}1\\ &{} &{} &{} 1 &{} 0 \end{pmatrix}, \end{aligned}$$

where \(I\in \mathbb {R}^{N\times N}\) is the identity matrix, D is a diagonal matrix with \(d_j:=\frac{2\eta _j(\omega )}{h^2N}\) on the diagonal, and \(\textbf{1}=(1,\dots ,1)^\top \in \mathbb {R}^{N\times 1}\). In particular, a direct calculation verifies that the iteration matrix of (12) with \(\theta =1\) and with this new order of unknowns is equal to

$$\begin{aligned} \mathcal {G}= \mathcal {I} - ({\widetilde{I}}\otimes {\widetilde{B}}^{-1})({\widetilde{I}}\otimes {\widetilde{B}} + H\otimes B)=- H\otimes C, \end{aligned}$$

with \(C:={\widetilde{B}}^{-1}B\), and \(\mathcal {I}\in \mathbb {R}^{(2N_h N)\times (2N_h N)}\) being the identity matrix. We will next characterize precisely the spectrum of \(\mathcal {G}\), which in turns gives an exact description of the convergence on the one-level collective smoother. To do so, we first study the spectrum of C denoted by \(\sigma (C)\).

Lemma 2

(Spectrum of C) The matrix C has the spectrum

$$\begin{aligned} \sigma (C)=-\frac{1}{2}\left\{ 1,1-r\pm i\sqrt{(1-r)r}\right\} , \end{aligned}$$

with \(r=\frac{{\widehat{{\mathbb {E}}}}\left[ \widetilde{\textbf{d}}^{-2}\right] }{\nu +{\widehat{{\mathbb {E}}}}\left[ \widetilde{\textbf{d}}^{-2}\right] }\), \(\widetilde{\textbf{d}}\in \mathbb {R}^{N\times 1}\), \((\widetilde{\textbf{d}})_j=N d_j\), and \({\widehat{{\mathbb {E}}}}\left[ \widetilde{\textbf{d}}^{-2}\right] :=\frac{1}{N}\sum _{j=1}^N (\widetilde{\textbf{d}})_j^{-2}\). The eigenvalue \(\lambda =-\frac{1}{2}\) has algebraic multiplicity \(2N-2\) and geometric multiplicity \(N-1\).

Proof

Since \(\frac{I}{N}\) and D are non singular, to compute C we use the exact formula for the inverse of \({\widetilde{B}}\). Setting \(\Gamma :=\frac{1}{\nu N^2 +\textbf{1}^\top \frac{D^{-2}}{N}\textbf{1}}=\frac{1}{\nu N^2 +N^2 {\widehat{{\mathbb {E}}}}\left[ \widetilde{\textbf{d}}^{-2}\right] }\), with \((\widetilde{\textbf{d}})_j=\frac{2\eta _j(\omega )}{h^2}\), we obtain

$$\begin{aligned} \begin{aligned} C&={\widetilde{B}}^{-1}B =\frac{1}{2}\begin{pmatrix} -I +\frac{\Gamma }{N} D^{-1} \textbf{1}\textbf{1}^\top D^{-1} &{} -\Gamma D^{-1}\textbf{1}\textbf{1}^\top \\ \frac{D^{-1}}{N}-\frac{\Gamma }{N^2}D^{-2}\textbf{1}\textbf{1}^\top D^{-1} &{} -I +\frac{\Gamma }{N}D^{-2}\textbf{1}\textbf{1}^\top \end{pmatrix}\\&=\frac{1}{2}\begin{pmatrix} -I &{} 0\\ \frac{D^{-1}}{N} &{} -I \end{pmatrix}+\frac{\Gamma N}{2}\begin{pmatrix} \widetilde{\textbf{d}}^{-1}\widetilde{\textbf{d}}^{-\top } &{} -\widetilde{\textbf{d}}^{-1}\textbf{1}^\top \\ -\widetilde{\textbf{d}}^{-2}\widetilde{\textbf{d}}^{-\top } &{} \widetilde{\textbf{d}}^{-2}\textbf{1}^\top \end{pmatrix}. \end{aligned} \end{aligned}$$

For simplicity, we focus on \({\widehat{C}}:=-2C\), which can be written as

$$\begin{aligned}{\widehat{C}}=\underbrace{\begin{pmatrix} I &{} 0\\ -\frac{D^{-1}}{N} &{} I \end{pmatrix}}_{L}+ \textbf{a}\textbf{c}^\top ,\quad \text {with}\quad \textbf{a}:=\Gamma N \begin{pmatrix} -\widetilde{\textbf{d}}^{-1}\\ \widetilde{\textbf{d}}^{-2} \end{pmatrix},\; \textbf{c}:=\begin{pmatrix} \widetilde{\textbf{d}}^{-1}\\ -\textbf{1}\end{pmatrix},\end{aligned}$$

that is, \({\widehat{C}}\) is the sum of a lower triangular matrix plus a rank-one perturbation. Notice that L has eigenvalue \(\lambda =1\) with algebraic multiplicity 2N and geometric multiplicity N. The eigenspace associated to \(\lambda =1\) is \(E_{\lambda =1}(L):=\text {span}\left\{ \textbf{e}_j,\; j=N+1,\dots ,2N\right\} \), \(\textbf{e}_j\) being the j-th canonical vector. Next, if \(N> 2\), \({\widehat{C}}\) has still eigenvalue \(\lambda =1\) since for any vector \(\textbf{v}=(0,\textbf{v}_2)\), \(\textbf{v}_2\in \mathbb {R}^{N\times 1}\), such that \(\textbf{1}^\top \textbf{v}_2=0\), we have

$$\begin{aligned} (L+\textbf{a}\textbf{c}^\top ) \textbf{v}= L\textbf{v}=\textbf{v}. \end{aligned}$$

Therefore, \(\lambda =1\) is an eigenvalue of \({\widehat{C}}\) with geometric multiplicity \(N-1\).

To find the remaining eigenvalues, we take a \(\lambda \ne 1\) and consider

$$\begin{aligned} \begin{aligned} \det (L-\lambda I_{2N\times 2N} +\textbf{a}\textbf{c}^\top )&=\det (L-\lambda I_{2N\times 2N})\det (I_{2N\times 2N}+(L-\lambda I_{2N\times 2N})^{-1}\textbf{a}\textbf{c}^\top )\\&=(1-\lambda )^{2N}\left( 1+\textbf{c}^\top (L-\lambda I_{2N\times 2N})^{-1} \textbf{a}\right) . \end{aligned} \end{aligned}$$

A direct calculation leads to

$$\begin{aligned} \begin{aligned} \textbf{c}^\top (L-\lambda I_{2N\times 2N})^{-1} \textbf{a}=(\textbf{c}_1,\textbf{c}_2)^\top \begin{pmatrix} \frac{I}{1-\lambda } &{} 0 \\ \frac{D^{-1}}{N(1-\lambda )^2} &{}\frac{I}{1-\lambda } \end{pmatrix}\begin{pmatrix} \textbf{a}_1\\ \textbf{a}_2 \end{pmatrix} \end{aligned}, \end{aligned}$$
(17)

so that

$$\begin{aligned} \det (L-\lambda I_{2N\times 2N} +\textbf{a}\textbf{c}^\top )=(1-\lambda )^{2N-2}\left( \lambda ^2-(2+\textbf{a}^\top \textbf{c})\lambda +1 +\textbf{a}^\top \textbf{c} +\textbf{c}_2^\top \frac{D^{-1}}{N} \textbf{a}_1\right) , \end{aligned}$$

from which we conclude that \(\lambda =1\) has algebraic multiplicity \(2(N-1)\). The remaining eigenvalues must be solutions of the second order equation. Using \(\textbf{a}^\top \textbf{c}=-2\Gamma N \sum _{i=j}^N \widetilde{d}_j^{-2}\), \(\textbf{c}_2^\top \frac{D^{-1}}{N} \textbf{a}_1=\Gamma N\sum _{j=1}^N \widetilde{d}_j^{-2}\), recalling the definition of \(\Gamma \) and r, and dividing by \(-\frac{1}{2}\), one obtains the solutions \(\lambda _{2N-1,2N}=-\frac{1}{2}\left\{ 1-r\pm i \sqrt{(1-r)r}\right\} \), and the claim follows.

Remark 2

(Dependence on the regularization parameter) The regularization parameter \(\nu \) enters into our convergence analysis only in the definition of r. In particular as \(\nu \rightarrow 0\), \(r\rightarrow 1\) and \(|\lambda _{2N-1,2N}|\rightarrow 0\), and the convergence of the collective multigrid does not deteriorate (see Lemma 2). The robustness of the algorithm with respect to the (often troublesome) \(\nu \rightarrow 0\) limit will be observed in the numerical experiments.

From Lemma 2, we deduce that C admits the Jordan decomposition \(CV=VJ\), with

$$\begin{aligned} \begin{aligned} J&=\begin{pmatrix} -0.5 &{} 1\\ &{} -0.5 &{} \\ &{} &{} -0.5 &{}1 \\ &{} &{} &{} -0.5\\ &{} &{} &{} &{}\ddots &{}\ddots \\ &{} &{} &{} &{} &{}\lambda _{2N-1}\\ &{} &{} &{} &{} &{} &{}&{}\lambda _{2N} \end{pmatrix},\\ V&=[\textbf{v}_1,{\widehat{{\textbf{v}}}}_1,\textbf{v}_{2},{\widehat{{\textbf{v}}}}_2,\dots ,\textbf{v}_{2N-1},\textbf{v}_{2N}], \end{aligned} \end{aligned}$$
(18)

where \(\textbf{v}_j\), \(j=1,\dots ,N-1\), are the eigenvectors of C, \({\widehat{{\textbf{v}}}}_j\), \(j=1,\dots ,N-1\), are the generalized eigenvectors satisfying \((C-\lambda _jI){\widehat{{\textbf{v}}}}_j=\textbf{v}_j\), and \(\textbf{v}_{2N-1}\) and \(\textbf{v}_{2N}\) are the eigenvectors associated to the two remaining eigenvalues \(\lambda _{2N-1,2N}\).

Exploiting the Kronecker structure of \(\mathcal {G}\), we obtain immediately the following two corollaries.

Corollary 3

(Similarity transformation of \(\mathcal {G}\)) For \(i=1,\dots ,N_h\) and \(j=1,\dots 2N\), let \(\delta _{i,j}:=-\mu _j\lambda _i\), where \(\lambda _i\) is an eigenvalue of C, and \(\mu _j=2\cos \left( \frac{j\pi }{N_h+1}\right) .\) Then, \(\mathcal {G}\) satisfies \(\mathcal {G} Y=Y\widetilde{J}\), where \(\widetilde{J}\) is an upper triangular matrix with \(\delta _{i,j}\) on the diagonal, and the k-th column of Y, with \(k=i+j-1\) for some i and j, is \(Y_k={\varvec{\varphi }}_j\otimes V_i\), \(V_i\) being the i-th column of V defined in (18), and \(({\varvec{\varphi }}_j)_i:=\sin \left( \frac{ij\pi }{N_h+1}\right) \).

Proof

We first notice that H is a tridiagonal Toeplitz matrix, and it is well-known (see [28]) that has eigenvalues \(\mu _j=2\cos \left( \frac{j\pi }{N_h+1}\right) \) and eigenvectors of the form \(({\varvec{\varphi }}_j)_i=\sin \left( \frac{ij\pi }{N_h+1}\right) \). Due to the properties of the Kronecker product, it is trivial to verify that

$$\begin{aligned} \mathcal {G}({\varvec{\varphi }}_j\otimes \textbf{v}_i)=-(H{\varvec{\varphi }}_j)\otimes (C\textbf{v}_i)=-\mu _j\lambda _i ({\varvec{\varphi }}_j\otimes \textbf{v}_i). \end{aligned}$$

If instead we consider a generalized eigenvector \({\widehat{{\textbf{v}}}}_i\), using the Jordan decomposition, we have

$$\begin{aligned} \mathcal {G}({\varvec{\varphi }}_j\otimes {\widehat{{\textbf{v}}}}_i)=-(H{\varvec{\varphi }}_j)\otimes (C{\widehat{{\textbf{v}}}}_i)=-\mu _j \lambda _i ({\varvec{\varphi }}_j \otimes {\widehat{{\textbf{v}}}}_i) -\mu _j ({\varvec{\varphi }}_j \otimes \textbf{v}_i), \end{aligned}$$

and the claim follows.

Corollary 4

(Spectral radius of \(\mathcal {G}\)) The spectral radius of \(\mathcal {G}\) is strictly smaller than 1, and satisfies \(\rho (\mathcal {G})\le 1-\mathcal {O}\left( \frac{1}{N_h^2}\right) \). Therefore, the collective smoothing iteration converges.

Proof

Corollary 3 shows that \(\mathcal {G}\) is similar to the upper triangular matrix \(\widetilde{J}\). Thus, its eigenvalues are equal to \(\delta _{i,j}=-\mu _j\lambda _i\). Observing that \(|\mu _j |<2|\cos \left( \frac{\pi }{N_h+1}\right) |\) and \(|\lambda _i|\le 0.5\) for any ji, the claim follows.

Remark 3

(Damping) The analysis has been carried out for the relaxation parameter \(\theta =1\). It is trivial to consider \(\theta \ne 1\), since the iteration matrix is then \(\mathcal {G}_{\theta }:=(1-\theta ) \mathcal {I} +\theta \mathcal {G}\).

We next study the spectrum of the two-level collective multigrid algorithm, and assume that \(N_{h}=2^{\ell }-1\) and \(N^C_{h}=2^{{\ell -1}}-1\) for a \(\ell \in \mathbb {N}\). As maps between the fine and coarse meshes, we choose the full weighting restriction matrix,

$$\begin{aligned} {\widetilde{R}}:=\frac{1}{2}\left( \begin{array}{lllllll} \frac{1}{2} &{} 1 &{}\frac{1}{2} \\ &{} &{} \frac{1}{2} &{} 1 &{} \frac{1}{2}\\ &{} &{} &{} \cdots \\ &{} &{} &{} &{} \frac{1}{2} &{} 1 &{}\frac{1}{2}\\ \end{array}\right) \in \mathbb {R}^{N_h^C\times N_h}, \end{aligned}$$

and the linear interpolation operator \(\widetilde{P}:=2 \widetilde{R}^\top \). In particular, the action of \(\widetilde{R}\) and \(\widetilde{P}\) on the frequencies \({\varvec{\varphi }}_j\) can be characterized rigorously (see, e.g., [29, Lemma 4.17]). Let \(\varvec{\phi }_j\in \mathbb {R}^{N_h^C\times 1}\) with \((\varvec{\phi }_j)_i=\sin \left( \frac{2ij\pi }{N_h+1}\right) \), \(j=1,\dots ,N_h\) and \(i=1,\dots ,N_h^C\). Further define \(c_j:=\cos \left( \frac{j\pi }{2(N_h+1)}\right) \) and \(s_j:=\sin \left( \frac{j\pi }{2(N_h+1)}\right) \). Then, for any \(e_j,\; e_{\widetilde{j}}\in \mathbb {R}\), with \(\widetilde{j}:=N_h+1-j\) and \(j=1,\dots ,\frac{N_h+1}{2}-1\),

$$\begin{aligned} \begin{aligned} \widetilde{R} \begin{pmatrix} {\varvec{\varphi }}_j&{\varvec{\varphi }}_{\widetilde{j}} \end{pmatrix}\begin{pmatrix} e_j\\ e_{\widetilde{j}} \end{pmatrix}&=\widetilde{R}\left( e_j {\varvec{\varphi }}_j+ e_{\widetilde{j}} {\varvec{\varphi }}_{\widetilde{j}}\right) =(e_jc_j^2-e_{\widetilde{j}}s_j^2)\varvec{\phi }_j=\varvec{\phi }_j\begin{pmatrix} c_j^2&-s_j^2 \end{pmatrix}\begin{pmatrix} e_j\\ e_{\widetilde{j}} \end{pmatrix},\\ \widetilde{P}\varvec{\phi }_j&=(c_j^2{\varvec{\varphi }}_j -s_j^2{\varvec{\varphi }}_{\widetilde{j}})=\begin{pmatrix} {\varvec{\varphi }}_j&{\varvec{\varphi }}_{\widetilde{j}} \end{pmatrix}\begin{pmatrix} c_j^2\\ -s_j^2 \end{pmatrix}. \end{aligned} \end{aligned}$$
(19)

Furthermore, \(R{\varvec{\varphi }}_{\overline{j}}=0\) for \(\overline{j}:=\frac{N_h+1}{2}\). The iteration matrix of the two-level algorithm with one-step of presmoothing and no post-smoothing is

$$\begin{aligned} T:=(I-RS_c^{-1}PS)\mathcal {G}, \end{aligned}$$

where \(R=\widetilde{R}\otimes I\), \(P=\widetilde{P}\otimes I\), and \(S_C=RSP\).

Lemma 5

The two-level operator T is similar to a block diagonal matrix whose diagonal blocks are:

  1. 1

    The matrices \(T_{ji}:=\mathcal {G}_{ji}-R_{j}^\top \Pi _{ji}^{-1}R_{j}S_{ji}\mathcal {G}_{ji}\in \mathbb {R}^{4\times 4}\) for \(j=1,\dots ,\frac{N_h+1}{2}-1\) and \(i=1,\dots ,N-1\), with

    $$\begin{aligned} \begin{aligned} \mathcal {G}_{ji}&:= \begin{pmatrix} \delta _{ji} &{} &{} -\mu _j\\ &{} \delta _{\widetilde{j}i} &{} &{}-\mu _{\widetilde{j}}\\ &{} &{} \delta _{ji} \\ &{} &{} &{} \delta _{\widetilde{j}i} \end{pmatrix},\quad S_{ji}:=\begin{pmatrix} (1-\delta _{ji}) &{} &{} -\mu _j\\ &{} (1-\delta _{\widetilde{j}i}) &{} &{}-\mu _{\widetilde{j}}\\ &{} &{} (1-\delta _{ji}) \\ &{} &{} &{} (1-\delta _{\widetilde{j}i}) \end{pmatrix},\\ R_j&:=\begin{pmatrix} c_j^2&{} -s_j^2 \\ &{} &{} c_j^2 &{} -s_j^2 \end{pmatrix},\quad P_j=R_j^\top ,\quad \Pi _{ji}:=R_j S_{ji}R_j^\top . \end{aligned} \end{aligned}$$
  2. 2

    The matrices \(\mathcal {G}_{\overline{j}i}=\begin{pmatrix} \delta _{\overline{j}i} &{} -\mu _{\overline{j}}\\ &{} \delta _{\overline{j}i} \end{pmatrix}\in \mathbb {R}^{2\times 2}\) for \(\overline{j}=\frac{N_h+1}{2}\), and \(i=1,\dots ,N-1\).

  3. 3

    The matrices \(\widehat{T}_{ji}:={\widehat{{\mathcal {G}}}}_{ji}-\widehat{R}_{j}^\top {\widehat{\Pi }}_{ij}^{-1}\widehat{R}_{j}\widehat{S}_{ji}{\widehat{{\mathcal {G}}}}_{ij}\in \mathbb {R}^{2\times 2}\) for \(j=1,\dots ,\frac{N_h+1}{2}-1\) and \(i=2N-1, 2N\), with

    $$\begin{aligned} \begin{aligned} {\widehat{{\mathcal {G}}}}_{ji}&:= \begin{pmatrix} \delta _{ji} &{} \\ &{} \delta _{\widetilde{j}i} \end{pmatrix},\quad \widehat{S}_{ji}:=\begin{pmatrix} (1-\delta _{ji}) &{}\\ &{} (1-\delta _{\widetilde{j}i}) \end{pmatrix},\\ \widehat{R}_j&:=\begin{pmatrix} c_j^2&{} -s_j^2 \\ \end{pmatrix},\quad \widehat{P}_j=\widehat{R}_j^\top ,\quad {\widehat{\Pi }}_{ji}:=c_j^4(1-\delta _{ji})+s_j^4(1-\delta _{ji}). \end{aligned} \end{aligned}$$
  4. 4

    The matrices \({\widehat{{\mathcal {G}}}}_{\overline{j}i}=\begin{pmatrix} \delta _{\overline{j}i} &{} \\ &{} \delta _{\overline{j}i} \end{pmatrix}\in \mathbb {R}^{2\times 2}\) for \(\overline{j}=\frac{N_h+1}{2}\), and \(i=2N-1, 2N\).

Proof

The proof follows closely the arguments presented in [30, 31] for the study of two-level iterative methods. It consists in studying the action of T onto suitably defined subspaces, showing that these subspaces are invariant, and finally deriving a matrix representation of T into a new basis. We start with the four dimensional subspaces \(\mathcal {V}_{ji}:=\text {span}\left\{ {\varvec{\varphi }}_j\otimes \textbf{v}_i,\; {\varvec{\varphi }}_{\widetilde{j}} \otimes \textbf{v}_i,\; {\varvec{\varphi }}_{j} \otimes {\widehat{{\textbf{v}}}}_i,\; {\varvec{\varphi }}_{\widetilde{j}} \otimes {\widehat{{\textbf{v}}}}_i\right\} \), for \(j=1,\dots ,\frac{N_h+1}{2}-1\), \(i=1,\dots ,N-1\). For any quadruple of real numbers \(e_j, e_{\widetilde{j}},\widehat{e}_{j},\widehat{e}_{\widetilde{j}}\), using \(H{\varvec{\varphi }}_j=\mu _j{\varvec{\varphi }}_j\) and the Jordan decomposition of C, we obtain

$$\begin{aligned} \begin{aligned}&\mathcal {G}\begin{pmatrix} {\varvec{\varphi }}_j\otimes \textbf{v}_i&{\varvec{\varphi }}_{\widetilde{j}} \otimes \textbf{v}_i&{\varvec{\varphi }}_j\otimes {\widehat{{\textbf{v}}}}_i&{\varvec{\varphi }}_{\widetilde{j}}\otimes {\widehat{{\textbf{v}}}}_i \end{pmatrix}\begin{pmatrix} e_j\\ e_{\widetilde{j}}\\ \widehat{e}_{j} \\ \widehat{e}_{\widetilde{j}} \end{pmatrix}\\&=\begin{pmatrix} {\varvec{\varphi }}_j\otimes \textbf{v}_i&{\varvec{\varphi }}_{\widetilde{j}} \otimes \textbf{v}_i&{\varvec{\varphi }}_j\otimes {\widehat{{\textbf{v}}}}_i&{\varvec{\varphi }}_{\widetilde{j}}\otimes {\widehat{{\textbf{v}}}}_i \end{pmatrix} \begin{pmatrix} \delta _{ji} &{} &{} -\mu _j\\ &{} \delta _{\widetilde{j}i} &{} &{}-\mu _{\widetilde{j}}\\ &{} &{} \delta _{ji} \\ &{} &{} &{} \delta _{\widetilde{j}i} \end{pmatrix} \begin{pmatrix} e_j\\ e_{\widetilde{j}}\\ \widehat{e}_{j} \\ \widehat{e}_{\widetilde{j}} \end{pmatrix}. \end{aligned} \end{aligned}$$

Next, since \({\widehat{{\textbf{v}}}}_i\) satisfies \((C-\lambda _i I){\widehat{{\textbf{v}}}}_i=\textbf{v}_i\), it holds \(B{\widehat{{\textbf{v}}}}_i={\widetilde{B}}(\textbf{v}_i+\lambda _i{\widehat{{\textbf{v}}}}_i)\), hence,

$$\begin{aligned} \begin{aligned}&S\mathcal {G}\begin{pmatrix} {\varvec{\varphi }}_j\otimes \textbf{v}_i&{\varvec{\varphi }}_{\widetilde{j}} \otimes \textbf{v}_i&{\varvec{\varphi }}_j\otimes {\widehat{{\textbf{v}}}}_i&{\varvec{\varphi }}_{\widetilde{j}}\otimes {\widehat{{\textbf{v}}}}_i \end{pmatrix}\begin{pmatrix} e_j\\ e_{\widetilde{j}}\\ \widehat{e}_{j} \\ \widehat{e}_{\widetilde{j}} \end{pmatrix}\\&=\begin{pmatrix} {\varvec{\varphi }}_j\otimes \textbf{v}_i&{\varvec{\varphi }}_{\widetilde{j}} \otimes \textbf{v}_i&{\varvec{\varphi }}_j\otimes {\widehat{{\textbf{v}}}}_i&{\varvec{\varphi }}_{\widetilde{j}}\otimes {\widehat{{\textbf{v}}}}_i \end{pmatrix} \begin{pmatrix} (1-\delta _{ji}) &{} &{} -\mu _j\\ &{} (1-\delta _{\widetilde{j}i}) &{} &{}-\mu _{\widetilde{j}}\\ &{} &{} (1-\delta _{ji}) \\ &{} &{} &{} (1-\delta _{\widetilde{j}i}) \end{pmatrix}G_{ji} \begin{pmatrix} e_j\\ e_{\widetilde{j}}\\ \widehat{e}_{j} \\ \widehat{e}_{\widetilde{j}} \end{pmatrix}, \end{aligned} \end{aligned}$$

and recalling (19),

$$\begin{aligned} \begin{aligned}&RS\mathcal {G}\begin{pmatrix} {\varvec{\varphi }}_j\otimes \textbf{v}_i&{\varvec{\varphi }}_{\widetilde{j}} \otimes \textbf{v}_i&{\varvec{\varphi }}_j\otimes {\widehat{{\textbf{v}}}}_i&{\varvec{\varphi }}_{\widetilde{j}}\otimes {\widehat{{\textbf{v}}}}_i \end{pmatrix}\begin{pmatrix} e_j\\ e_{\widetilde{j}}\\ \widehat{e}_{j} \\ \widehat{e}_{\widetilde{j}} \end{pmatrix}\\&=\begin{pmatrix} \varvec{\phi }_j\otimes \textbf{v}_i&\varvec{\phi }_j\otimes {\widehat{{\textbf{v}}}}_i \end{pmatrix} \begin{pmatrix} c_j^2&{} -s_j^2 \\ &{} &{} c_j^2 &{} -s_j^2 \end{pmatrix}S_{ji}G_{ji} \begin{pmatrix} e_j\\ e_{\widetilde{j}}\\ \widehat{e}_{j} \\ \widehat{e}_{\widetilde{j}} \end{pmatrix}. \end{aligned} \end{aligned}$$

We now consider the coarse correction.

$$\begin{aligned} \begin{aligned}&S_c \begin{pmatrix} \varvec{\phi }_j\otimes \textbf{v}_i&\varvec{\phi }_j\otimes {\widehat{{\textbf{v}}}}_i\end{pmatrix}\begin{pmatrix} e^c_j\\ \widehat{e}^c_j \end{pmatrix} =RSP \begin{pmatrix} \varvec{\phi }_j\otimes \textbf{v}_i&\varvec{\phi }_j\otimes {\widehat{{\textbf{v}}}}_i\end{pmatrix}\begin{pmatrix} e^c_j\\ \widehat{e}^c_j \end{pmatrix}\\&=RS \begin{pmatrix} {\varvec{\varphi }}_j\otimes \textbf{v}_i&{\varvec{\varphi }}_{\widetilde{j}} \otimes \textbf{v}_i&{\varvec{\varphi }}_j\otimes {\widehat{{\textbf{v}}}}_i&{\varvec{\varphi }}_{\widetilde{j}}\otimes {\widehat{{\textbf{v}}}}_i \end{pmatrix}R_{j}^\top \begin{pmatrix} e^c_j\\ \widehat{e}^c_j \end{pmatrix}\\&=\begin{pmatrix} \varvec{\phi }_j\otimes \textbf{v}_i&\varvec{\phi }_j\otimes {\widehat{{\textbf{v}}}}_i\end{pmatrix}R_{j}S_{ji} R_{j}^\top \begin{pmatrix} e^c_j\\ \widehat{e}^c_j, \end{pmatrix}=\begin{pmatrix} \varvec{\phi }_j\otimes \textbf{v}_i&\varvec{\phi }_j\otimes {\widehat{{\textbf{v}}}}_i\end{pmatrix} \Pi _{ji} \begin{pmatrix} e^c_j\\ \widehat{e}^c_j, \end{pmatrix} \end{aligned} \end{aligned}$$

which implies

$$\begin{aligned} S_c^{-1}\begin{pmatrix} \varvec{\phi }_j\otimes \textbf{v}_i&\varvec{\phi }_j\otimes {\widehat{{\textbf{v}}}}_i\end{pmatrix}=\begin{pmatrix} \varvec{\phi }_j\otimes \textbf{v}_i&\varvec{\phi }_j\otimes {\widehat{{\textbf{v}}}}_i\end{pmatrix}\Pi _{ij}^{-1}. \end{aligned}$$

Putting all together, we get

$$\begin{aligned} \begin{aligned}&T\begin{pmatrix} {\varvec{\varphi }}_j\otimes \textbf{v}_i&{\varvec{\varphi }}_{\widetilde{j}} \otimes \textbf{v}_i&{\varvec{\varphi }}_j\otimes {\widehat{{\textbf{v}}}}_i&{\varvec{\varphi }}_{\widetilde{j}}\otimes {\widehat{{\textbf{v}}}}_i \end{pmatrix}\begin{pmatrix} e_j\\ e_{\widetilde{j}}\\ \widehat{e}_{j} \\ \widehat{e}_{\widetilde{j}} \end{pmatrix}\\&=\begin{pmatrix} {\varvec{\varphi }}_j\otimes \textbf{v}_i&{\varvec{\varphi }}_{\widetilde{j}} \otimes \textbf{v}_i&{\varvec{\varphi }}_j\otimes {\widehat{{\textbf{v}}}}_i&{\varvec{\varphi }}_{\widetilde{j}}\otimes {\widehat{{\textbf{v}}}}_i \end{pmatrix}\underbrace{\left( \mathcal {G}_{ji}-R_{j}^\top \Pi _{ji}^{-1}R_{j}S_{ji}\mathcal {G}_{ji}\right) }_{T_{ji}} \begin{pmatrix} e_j\\ e_{\widetilde{j}}\\ \widehat{e}_{j} \\ \widehat{e}_{\widetilde{j}} \end{pmatrix}. \end{aligned} \end{aligned}$$

This conclude the first part of the proof. We now consider the subspaces spanned by \({\varvec{\varphi }}_{\overline{j}} \otimes \textbf{v}_i\), \({\varvec{\varphi }}_{\overline{j}} \otimes {\widehat{{\textbf{v}}}}_i\) for \(i=1,\dots ,N-1\). Since \(R{\varvec{\varphi }}_{\overline{j}}=0\), we immediately have

$$\begin{aligned} T \begin{pmatrix} {\varvec{\varphi }}_{\overline{j}}\otimes \textbf{v}_i&{\varvec{\varphi }}_{\overline{j}}\otimes {\widehat{{\textbf{v}}}}_i \end{pmatrix} \begin{pmatrix} e_{\overline{j}}\\ \widehat{e}_{\overline{j}} \end{pmatrix}=\begin{pmatrix} {\varvec{\varphi }}_{\overline{j}}\otimes \textbf{v}_i&{\varvec{\varphi }}_{\overline{j}}\otimes {\widehat{{\textbf{v}}}}_i \end{pmatrix} \mathcal {G}_{\overline{j}i} \begin{pmatrix} e_{\overline{j}}\\ \widehat{e}_{\overline{j}} \end{pmatrix},\quad \mathcal {G}_{\overline{j}i}:=\begin{pmatrix} \delta _{\overline{j}i} &{}-\mu _{\overline{j}}\\ &{} \delta _{\overline{j}i} \end{pmatrix},\end{aligned}$$

and this proves the second claim. As third set of subspaces, we consider those spanned by respectively \(({\varvec{\varphi }}_j\otimes v_{2N-1},{\varvec{\varphi }}_{\widetilde{j}}\otimes v_{2N-1})\), and \(({\varvec{\varphi }}_j\otimes v_{2N},{\varvec{\varphi }}_{\widetilde{j}}\otimes v_{2N})\). Following the same calculations of the first part of the proof we obtain for \(i=2N-1\) and \(i=2N\),

$$\begin{aligned} \begin{aligned}&T\begin{pmatrix} {\varvec{\varphi }}_j\otimes v_{i}&{\varvec{\varphi }}_{\widetilde{j}}\otimes v_{i} \end{pmatrix}\begin{pmatrix} e_{j} \\ e_{\overline{j}} \end{pmatrix}\\&=\begin{pmatrix} {\varvec{\varphi }}_j\otimes v_{i}&{\varvec{\varphi }}_{\widetilde{j}}\otimes v_{i} \end{pmatrix} \left( {\widehat{{\mathcal {G}}}}_{ji}-\widehat{R}^\top _j {\widehat{\Pi }}_{ji}^{-1} \widehat{R}_j\widehat{S}_{ji}\mathcal {G}_{ji}\right) \begin{pmatrix} e_{j} \\ e_{\overline{j}} \end{pmatrix} \end{aligned} \end{aligned}$$

The proof of the fourth claim is identical to that of the second part and it is skipped for the sake of brevity. By considering a matrix V that has column-block wise the basis for the subspaces we considered, it is immediate to deduce that \(TV=V\widetilde{T}\), where \(\widetilde{T}\) is a block diagonal matrix with the blocks we computed.

Remark 4

(Generalization to arbitrary pre- and post-smoothing steps) Lemma () can be readily generalized to cover \(n_1\) pre-smoothing steps and \(n_2\) post-smoothing steps, but taking suitable powers of the matrices \(\mathcal {G}_{ji}\), \({\widehat{{\mathcal {G}}}}_{\overline{j}i}\), \({\widehat{{\mathcal {G}}}}_{ji}\) and \({\widehat{{\mathcal {G}}}}_{\overline{j}i}\). For instance, the matrix \(T_{ji}\) of part one becomes

$$\begin{aligned} T_{ji}:=\mathcal {G}_{ji}^{n_2}(I_{4\times 4}-R_j^\top \Pi _{ji}^{-1}R_jS_{ji})\mathcal {G}_{ji}^{n_1}. \end{aligned}$$

Theorem 6

(Spectrum and convergence of the two-level algorithm) The spectrum of the matrix \(T=\mathcal {G}^{n_2}(I-R S_c^{-1}PS)\mathcal {G}^{n_1}\) is

$$\begin{aligned} \begin{aligned} \sigma (T)&=\left\{ 0\right\} \cup \\&\left\{ \frac{c_j^4(1-\delta _{ji})\delta _{\widetilde{j}i}^{n_1+n_2}+s_j^4(1-\delta _{\widetilde{j}i})\delta _{j i}^{n_1+n_2}}{c_j^4(1-\delta _{ji})+s_j^4(1-\delta _{\widetilde{j}i})},\;j=1,\dots ,\frac{N_h+1}{2}-1,\;i=1,\dots ,2N\right\} . \end{aligned} \end{aligned}$$
(20)

Further, the spectral radius of T is strictly smaller than 1, hence the two-level collective multigrid algorithm converges.

Proof

Since T is similar to a block diagonal matrix, with blocks defined in Lemma 5, it is sufficient to compute the spectrum of each block. Further, the spectrum of T is equal to that of \((I-R S_c^{-1}PS)\mathcal {G}^{n_1+n_2}\). Hence, we start considering the blocks \(T_{ji}=(I_{4\times 4}-R_j^\top \Pi _{ji}^{-1}R_jS_{ji})\mathcal {G}_{ji}^{n_1+n_2}\). Direct calculations show that

(21)

where the expression of \(X\in \mathbb {R}^{4\times 4}\) will not be needed in the following and \(\gamma :=c_j^4(1-\delta _{ji})+s_j^4(1-\delta _{\widetilde{j}i})\). Since the product of two upper triangular matrices is still upper triangular, it follows that

$$\begin{aligned} (I_{4\times 4}-R_j^\top \Pi _{ji}^{-1}R_jS_{ji})\mathcal {G}_{ji}^{n_1+n_2}=\begin{pmatrix} K &{} \widetilde{X} \\ &{} K \end{pmatrix},\end{aligned}$$

with

$$\begin{aligned} K:=\frac{1}{\gamma }\begin{pmatrix} s_j^4(1-\delta _{\widetilde{j}i})\delta _{ji}^{n_1+n_2} &{} c_j^2s_j^2(1-\delta _{\widetilde{j}i})\delta _{\widetilde{j}i}^{n_1+n_2} \\ c_j^2s_j^2(1-\delta _{j i})\delta _{j i}^{n_1+n_2} &{} c_j^4(1-\delta _{ji })\delta _{\widetilde{j}i}^{n_1+n_2} \end{pmatrix}, \end{aligned}$$

and whose eigenvalues are \(\kappa ^{ji}_1=\frac{c_j^4(1-\delta _{ji})\delta _{\widetilde{j}i}^{n_1+n_2}+s_j^4(1-\delta _{\widetilde{j}i})\delta _{j i}^{n_1+n_2}}{c_j^4(1-\delta _{ji})+s_j^4(1-\delta _{\widetilde{j}i})}\) and \(\kappa _2=0\). Next, \(\mathcal {G}^{n_1+n_2}_{\overline{j}i}\) and \({\widehat{{\mathcal {G}}}}^{n_1+n_2}_{\overline{j}i}\) have trivially eigenvalues equal to \(\delta ^{n_1+n_2}_{\overline{j}i}\), which are all equal to zero since \(\mu _{\bar{j}}=0\). Further, direct calculations show that \(\widehat{T}_{ji}\) has also two eigenvalues equal, again, to \(\kappa ^{ji}_1\) and \(\kappa _2\). Taking into account the range of the indices of j and i for each blocks, we obtain the characterization of the spectrum, and since \(|\delta _{ji}|<1\) and \(|\delta _{\widetilde{j}i}|<1\), we conclude that the spectral radius of T is smaller than one.

Figure 1 shows the spectrum of T where, for visualization purposes, we set \(N_h=31\) and \(N=10\). In particular, the right panel shows that the spectrum is grouped into \(\frac{N_h-1}{2}\) clusters, in which each eigenvalue is repeated approximately 2N times (approximately, because C has two eigenvalues, \(\lambda _{2N-1,2N}\) slightly different from 0.5.)

Fig. 1
figure 1

Top row: graphical representation of the spectrum of T for \(N_h=31\), \(N=10\), \(\nu =10^{-2}\) and \(n_1=n_2=1\). The blue circles are obtained by computing numerically the eigenvalues of T. The red crosses are obtained through the formulae of Theorem 6. Bottom row: comparison between the numerical and theoretical convergence of the two-level algorithm

Remark 5

(Extension of the analysis to the deterministic setting) Our analysis also represents a novel approach to study the convergence of collective smoothing iterations in the case of a deterministic PDE constraint by setting \(N=1\). Retracing the analysis, we observe that C has only two eigenvalues equal to \(\lambda _{2N-1,2N}\) and \(\mathcal {G}\) is diagonal. T can then be diagonalized more easily, and its spectrum is still characterized by (20), where the index i assumes only the values \(2N-1\) and 2N.

Remark 6

(Extension to the two and three dimensional physical space) The analysis could be extended to square or cube domains. Due to the Kronecker product structure between spatial and probability quantities, only the matrix H would have to change, and its eigenvectors would be the tensorized product of sine functions. Similarly, the action of the operators \(\widetilde{R}\) and \(\widetilde{P}\) would be represented by more complicated matrices.

This concludes our theoretical study of the convergence of the two-level collective multigrid algorithm. The next sections will focus on analyzing its numerical performances in different cases.

3.2 Numerical Experiments

We now show the performance of Algorithm 1 and its robustness with respect to several parameters for the solution of (10). We first consider the state equation

$$\begin{aligned} a_{\omega }(y_{\omega },v)&=\int _\mathcal {D}\kappa (x,\omega ) \nabla y(x,\omega )\cdot \nabla v(x)\ dx\nonumber \\&=\int _\mathcal {D}u(x)v(x)\ dx,\quad \forall v\in V,\ \mathbb {P}\text {-a.e. }\omega \in \Omega , \end{aligned}$$
(22)

in the L-shaped domain \(\mathcal {D}=(0,1)^2\setminus \overline{(0.5,1)}^2\) discretized with a regular mesh of squares of edge \(h_{\ell }=2^{-\ell }\), which are then decomposed into two right triangles. We choose \(\kappa (x,\omega )\) as an approximated log-normal diffusion field

$$\begin{aligned} \kappa (x,\omega )=e^{\sigma \sum _{j=1}^M \sqrt{\lambda _j}b_j(x)N_j(\omega )}\approx e^{g(x,\omega )}, \end{aligned}$$
(23)

where \(g(x,\omega )\) is a mean zero Gaussian field with Covariance function \(Cov_g(x,y)=\sigma ^2 e^{\frac{-\Vert x-y\Vert _{2}^2}{L^2}}\). The parameter \(\sigma ^2\) tunes the variance of the random field, while L denotes the correlation length. The pairs \((b_j(x),\sigma ^2\lambda _j)\) are the eigenpairs of \(T:L^2(\mathcal {D})\rightarrow L^2(\mathcal {D})\), \((Tf)(x)=\int _\mathcal {D}Cov_g(x,y)f(y)\ dy\), and \(N_j{\mathop {\sim }\limits ^{iid}} \mathcal {N}(0,1)\). Assumption 1 is satisfied since \(a_{\min }(\omega )=\left( \text {ess}\inf _{x\in \mathcal {D}} \kappa (x,\omega )\right) ^{-1}\) and \(a_{\max }(\omega )=\Vert \kappa (\cdot ,\omega )\Vert _{L^\infty (\mathcal {D})}\) are in \(L^p(\Omega )\) for every \(p<\infty \) [32]. The target state is \(y_d=e^{y^2}\sin (2\pi x)\sin (2\pi y)\).

Table 1 shows the number of V-cycle iterations (Algorithm 1) and of GMRES iterations preconditioned by the V-cycle to solve (10) up to a tolerance of \(10^{-9}\) on the relative (unpreconditioned) residual. Inside the V-cycle algorithm, we use \(n_1=n_2=2\) pre- and post-smoothing iterations based on the Jacobi relaxation (12) with a damping parameter \(\theta =0.5\) (the same value will be used for all numerical experiments in this manuscript). Numerically, we observed that Gauss-Seidel relaxations lead to very similar results. The number of levels of the V-cycle hierachy is denoted with \(N_L\). The size of the largest linear system solved per sub-table is denoted by \(N_{\max }=(2N+1)N_h\).

Table 1 Number of V-cycle (left) and preconditioned GMRES (right) iterations to solve (10) for a linear quadratic problem on the L-shaped domain \(\mathcal {D}=(0,1)^2{\setminus } \overline{(0.5,1)}^2\) with a distributed control

The first four sub-tables are based on a discretization of the probability space using the Stochastic Collocation method [33] on Gauss-Hermite tensorized quadrature nodes, since for \(L^2=0.5\), setting \(M=3\) into (23) is enough to preserve \(99\%\) of the variance. In the fifth sub-table we set \(L^2=0.1\) and use the Monte Carlo method, since we need \(M=15\) random variables to preserve \(99\%\) of the variance of the random field, and the Stochastic Collocation method suffers the curse of dimensionality. Remark that the multigrid algorithm is robust with respect to all parameters considered, namely the regularization parameter, the variance of the random field, the number of levels as the fine grid is refined, and the number of samples to discretize the probability space.

We mention that a family of block diagonal preconditioners for saddle-point matrices such as (2) were recently proposed in [11]. A detailed theoretical analysis was developed in [12] for distributed controls, in a more general setting than the one considered in this manuscript that covers a general finite element discretization of a d-dimensional domain, a general elliptic bilinear form, and an additional variance term in the cost functional. Their main attractive feature is the possibility to precondition fully in parallel the 2N PDEs. Nevertheless, their convergence deteriorates as \(\nu \rightarrow 0\) (as several preconditioners built on the same technique see, e.g., [34, 35]), so that these preconditioners are hardly effective when \(\nu \) is smaller than, say, \(10^{-3}/10^{-4}\). The robustness of the multigrid algorithm as \(\nu \rightarrow 0\) is definitely one of its most interesting properties. In terms of mesh refinement, both approaches are robust, provided that the 2N PDEs constraints are suitable preconditioned (e.g., with multigrid) in the approach of [11, 12]. Concerning the refinement of the discretization of the probability space, both methods are robust, and interestingly, both convergence analyses show a dependence on the approximated expected value of the square inverse of the coercivity constants of the stiffness matrices. One current disadvantage of the multigrid algorithm is the lack of coarsening with respect to the number of samples N, since the solution of the coarse problem might represent a bottleneck for very fine discretizations. In these circumstances, the capability of [11, 12] to handle the PDE constraints in parallel may be beneficial.

Table 2 Number of V-cycle (left) and preconditioned GMRES (right) iterations to solve (10) for a linear quadratic problem on the square domain \(\mathcal {D}=(0,1)^2\) with a local control acting on \(\mathcal {D}_0=(0.25,0.75)^2\)

Next, we consider the same problem (22)-(23) posed in the unit square domain \(\mathcal {D}=(0,1)^2\) with either a local control acting on the subset \(\mathcal {D}_0=(0.25, 0.75)^2\subset \mathcal {D}\), or a Neumann boundary control acting on \(\Gamma =(0,1)\times \left\{ 0\right\} \subset \partial \mathcal {D}\). Tables 2 and 3 report the performances of the multigrid algorithm for these two cases. We stress once more the excellent robustness and efficiency of the multigrid algorithm in all regimes.

4 An Optimal Control Problem Under Uncertainty with Box-constraints and \(L^1\) Penalization

In this section, we consider the nonsmooth OCPUUFootnote 1

$$\begin{aligned} \begin{aligned}&\min _{u\in U_{ad}} \frac{1}{2}\mathbb {E}\left[ \Vert y_\omega (u)-y_d\Vert ^2_{L^2(\mathcal {D})}\right] + \frac{\nu }{2}\Vert u\Vert ^2_{L^2(\mathcal {D})} + \beta \Vert u\Vert _{L^1(\mathcal {D})},\\&\quad \text {subject to}\\&a_\omega (y_\omega (u),v)=(u+f,v)_{L^2(\mathcal {D})},\quad \forall v \in V,\ \mathbb {P}\text {-a-e. } \omega \in \Omega ,\\&U_{ad}:=\left\{ v\in L^2(\mathcal {D}): a\le u \le b\quad \text {almost everywhere in }\mathcal {D}\right\} , \end{aligned} \end{aligned}$$
(24)

with \(a<0<b\) and \(\nu ,\beta >0\). Deterministic OCPs with a \(L^1\) penalization lead to optimal controls which are sparse, i.e. they are nonzero only on certain regions of the domain \(\mathcal {D}\) [36, 37]. Sparse controls can be of great interest in applications, because it is often not desirable, or even impossible, to control the system over the whole domain \(\mathcal {D}\). For sparse OCPUU, we mention [38] where the authors considered both a simplified version of (24) in which the randomness enters linearly into the state equation as a force term, and a different optimization problem whose goal is to find a stochastic control \(u(\omega )\) which has a similar sparsity pattern regardless of the realization \(\omega \). Note further that the assumption \(\nu >0\) does not eliminate the nonsmoothness of the objective functional, but it regularizes the optimal solution u, and is needed to use the fast optimation algorithm described in the following.

Table 3 Number of V-cycle (left) and preconditioned GMRES (right) iterations to solve (10) for a linear quadratic problem on the square domain \(\mathcal {D}=(0,1)^2\) with a boundary control acting on \(\Gamma =(0,1)\times \left\{ 0\right\} \)

The well-posedness of (24) follows directly from standard variational arguments [24, 25], being \(U_{ad}\) a convex set, \(\varphi (u):=\beta \Vert u\Vert _{L^1(\mathcal {D})}\) a convex function and the objective functional coercive. In particular, the optimal solution \(\overline{u}\) satisfies the variational inequality ([39, Proposition 2.2])

$$\begin{aligned} (\nu \overline{u} -S^\star (y_d-S(\overline{u}+f)),\overline{u} -v)+\varphi (\overline{u})-\varphi (v)\ge 0,\quad \forall v\in U_{ad}. \end{aligned}$$
(25)

Through a pointwise discussion of the box constraints and an analysis of a Lagrange multiplier belonging to the subdifferential of \(\varphi \) in \(\overline{u}\), [36] showed that (25) can be equivalently formulated as the nonlinear equation \(\mathcal {F}(\overline{u})=0\), with \(\mathcal {F}:L^2(\mathcal {D})\rightarrow L^2(\mathcal {D})\) defined as

$$\begin{aligned} \mathcal {F}(u):= & {} u-\frac{1}{\nu }\Bigg (\max (0,\mathcal {T}u-\beta )+\min (0,\mathcal {T}u+\beta )\nonumber \\{} & {} \qquad -\max (0,\mathcal {T}u-\beta -\nu b)-\min (0,\mathcal {T}u +\beta -\nu a)\Bigg ), \end{aligned}$$
(26)

where \(\mathcal {T}:L^2(\mathcal {D})\ni u \rightarrow -S^\star (Su)+ S^\star (y_d-Sf)\in L^2(\mathcal {D})\). Notice that \(\mathcal {F}\) is nonsmooth due to the presence of the Lipschitz functions \(\max (\cdot )\) and \(\min (\cdot )\). Nevertheless, \(\mathcal {F}\) can be shown to be semismooth [24], provided that \(\mathcal {T}\) is continuously Fréchet differentiable, and further Lipschitz continuous interpreted as map from \(L^2(\mathcal {D})\) to \(L^r(\mathcal {D})\), with \(r>2\) [24, 40]. These conditions are satisfied also in our settings since \(\mathcal {T}\) is affine and further the adjoint variable \(p_\omega \), solution of (8) with \(z=y_d-S(u+f)\), lies in \(L^2(\Omega ,H^1_0(\mathcal {D}))\) so that \(\mathcal {T}u=\mathbb {E}\left[ p_\omega \right] \in H^1_0(\mathcal {D})\subset L^r(\mathcal {D})\), where \(r>2\) follows from Sobolev embeddings.

Hence, to solve (26) we use the semismooth Newton method whose iteration reads for \(k=1,2,\dots \) until convergence,

$$\begin{aligned} u^{k+1}=u^{k}+du^k,\quad \text {with}\quad \mathcal {G}(u^k)du^k=-\mathcal {F}(u^k), \end{aligned}$$
(27)

\(\mathcal {G}(u):L^2(\mathcal {D})\rightarrow L^2(\mathcal {D})\) being the generalized derivative of \(\mathcal {F}\). Using the linearity of \(\mathcal {T}\) and considering the supports of the weak derivatives of \(\max (0,x)\) and \(\min (0,x)\), we obtain that

$$\begin{aligned} \mathcal {G}(u)[v]=v+\frac{1}{\nu }\chi _{(I^+\cup I^-)}S^\star Sv, \end{aligned}$$

where \(\chi \) is the charateristic function of the union of the disjoint sets

$$\begin{aligned} I^+=\left\{ x\in \mathcal {D}: 0\le \mathcal {T}u-\beta \le \nu b \right\} \text { and } I^-=\left\{ x\in \mathcal {D}: \nu a\le \mathcal {T}u+\beta \le 0\right\} . \end{aligned}$$

It is possible to show that the generalized derivative \(\mathcal {G}(u)\) is invertible with bounded inverse for all u, the proof being identical to the deterministic case treated in [41]. This further implies that the semismooth Newton method (27) converges locally superlinearly [40]. We briefly summarize these results in the following proposition.

Proposition 7

Let the initialization \(u^0\) be sufficiently close to the solution \(\overline{u}\) of (24). Then the iterates \(u^k\) generated by (27) converge superlinearly to \(\overline{u}\in L^2(\mathcal {D})\).

Introducing the supporting variables \(dy^k_\omega \) and \(dp^k_w\) in \(L^2(\Omega ;H^1_0(\mathcal {D}))\), the semismooth Newton equation \(\mathcal {G}(u^k)du^k=-\mathcal {F}(u^k)\) may be rewritten as the equivalent saddle point system

$$\begin{aligned} \begin{aligned}&a_\omega (dy^k_\omega ,v)-(du^k,v)=0,\quad \forall v\in V,\quad \mathbb {P}\text {-a-e. } \omega \in \Omega ,\\&a_\omega (v,dp^k_\omega )+(dy^k_\omega ,v)=0,\quad \forall v \in V,\ \mathbb {P}\text {-a-e. } \omega \in \Omega ,\\&(\nu \ du^k - \chi _{(I^+\cup I^-)}\mathbb {E}\left[ dp^k_\omega \right] ,v)_{L^2(\mathcal {D})}=-\mathcal {F}(u^k),\quad \forall v \in L^2(\mathcal {D}). \end{aligned} \end{aligned}$$
(28)

Further, if we set \(y^0=S(f+u^0)\) and \(p^0=S^\star (y_d-y^0)\), due to the linearity of S and \(S^\star \), it holds \(y^{k+1}=S(u^{k+1})=y^k+dy^{k}\) and similarly \(p^{k+1}=p^k+dp^k\). Once fully discretized and using the notation \({\widehat{{\mathbb {E}}}}\left[ p_\omega \right] =\sum _{j=1}^N \zeta _j \textbf{p}_{j}\), the optimality condition (26) can be expressed through the nonlinear finite-dimensional map \(\textbf{F}:\mathbb {R}^{N_h}\rightarrow \mathbb {R}^{N_h}\),

$$\begin{aligned} \begin{aligned} \textbf{F}(\textbf{u})=&\textbf{u}-\frac{1}{\nu }\Bigl (\max (0,{\widehat{{\mathbb {E}}}}\left[ \textbf{p}_\omega \right] -\beta )+\min (0,{\widehat{{\mathbb {E}}}}\left[ \textbf{p}_\omega \right] +\beta )\\&-\max (0,{\widehat{{\mathbb {E}}}}\left[ \textbf{p}_\omega \right] -\beta -\nu b)-\min (0,{\widehat{{\mathbb {E}}}}\left[ \textbf{p}_\omega \right] +\beta -\nu a)\Bigl ), \end{aligned} \end{aligned}$$

where the \(\max (\cdot )\) and \(\min (\cdot )\) functions act componentwise. Equation (28) leads to the saddle point system

$$\begin{aligned} \begin{pmatrix} M &{} &{} &{} &{} A_1^\top \\ &{} \ddots &{} &{} &{} &{}\ddots \\ &{} &{} M &{} &{} &{} &{} A_N^\top \\ &{} &{} &{} M &{} -\zeta _1 M H^k&{}\dots &{} -\zeta _N M H^k\\ A_1 &{} &{} &{} -M\\ &{} \ddots &{} &{}\vdots \\ &{} &{} A_N &{} -M \end{pmatrix} \begin{pmatrix} \textbf{dy}^k_1\\ \vdots \\ \textbf{dy}^k_N\\ \textbf{du}^k\\ \textbf{dp}^k_1\\ \vdots \\ \textbf{dp}^k_N \end{pmatrix}= \begin{pmatrix} \textbf{0}\\ \vdots \\ \textbf{0}\\ -\textbf{F}(\textbf{u}^{k}) \\ \textbf{0}\\ \vdots \\ \textbf{0} \end{pmatrix}, \end{aligned}$$
(29)

where \(H^k\in \mathbb {R}^{N_h\times N_h}\) is a diagonal matrix representing the charateristic function \(\chi _{I_k^+\cup I_k^-}\), namely

$$\begin{aligned} (H^k)_{i,i}=\frac{1}{\nu } \text { if }i\in I_k^+\cup I_k^-\quad \text { and }\quad (H^k)_{i,i}=0 \text { if }i\notin I_k^+\cup I_k^-, \end{aligned}$$

with

$$\begin{aligned} I_k^+=\left\{ i: 0\le {\widehat{{\mathbb {E}}}}\left[ \textbf{p}^k\right] -\beta \le \nu b \right\} \text { and } I_k^-=\left\{ i: \nu a\le {\widehat{{\mathbb {E}}}}\left[ \textbf{p}^k\right] +\beta \le 0\right\} . \end{aligned}$$
(30)

To derive the expression of H, we assumed that a Lagrangian basis is used for the finite element space. Notice that (29) fits into the general form (2), and thus we use the collective multigrid algorithm to solve it. Further, with the notation of (2), it holds

$$\begin{aligned} (G)_{i,i}+d_i^\top \text {diag}(a_i)^{-1}\text {diag}(c_i)\text {diag}(a_i)^{-1}e_i= (M)_{i,i}+(M)^3_{i,i} \sum _{j=1}^N \zeta _j (A_j)^{-2}_{i,i}>0 \end{aligned}$$

if \(i\in I^+\cup I^-\), and

$$\begin{aligned} (G)_{i,i}+d_i^\top \text {diag}(a_i)^{-1}\text {diag}(c_i)\text {diag}(a_i)^{-1}e_i=(M)_{i,i}>0, \end{aligned}$$

if \(i\notin I^+\cup I^-\). The collective multigrid iteration is then well-defined.

The overall semismooth Newton Algorithm is summarized in Algorithm 2. At each iteration we solve (29) using the collective multigrid algorithm (line 4) and update the active sets given the new iteration (line 10). Notice that in order to globalize the convergence, we consider a line-search step (lines 6-8) performed on the merit function \(\phi (\textbf{u})=\sqrt{\textbf{F}(\textbf{u})^\top M \textbf{F}(\textbf{u})}\) [42].

Algorithm 2
figure b

Globalized semismooth Newton Algorithm to solve \(\textbf{F}(\textbf{u})=0\)

4.1 Numerical Experiments

In this section we test the semismooth Newton algorithm for the solution of (26) and the collective multigrid algorithm to solve the related optimality system (29). We consider the random PDE-constraint (22) with the random diffusion coefficient (23) set on the L-squared domain. The semismooth iteration is stopped when \(\phi (\textbf{u}^k)<10^{-9}\). The inner linear solvers are stopped when the relative (unpreconditioned) residual is smaller than \(10^{-11}\).

Table 4 reports the number of semismooth Newton iterations and in brackets the averaged number of iterations of the V-cycle algorithm used as a solver (left) or as preconditioner for GMRES (right). Table 4 confirms the effectiveness of the multigrid algorithm, which requires essentially the same computational effort as in the linear-quadratic case.

Table 4 Number of semismooth Newton iterations (left), and average number of V-cycle (center) and preconditioned GMRES (right) iterations (in brackets)

More challenging is the limit \(\nu \rightarrow 0\) reported in Table 5. The performance of both the (globalized) semismooth Newton iteration and the inner multigrid solver deteriorates. The convergence of the outer nonlinear algorithm can be improved by performing a continuation method, namely we consider a sequence of \(\nu =10^{-j}\), \(j=2,\dots ,8\) and we start the j-th problem using as initial condition the optimal solution computed for \(\nu =10^{-j+1}\). Concerning the inner solver, the stand-alone multigrid algorithm struggles since for small values of \(\nu \) the optimal control is of bang-bang type, that is satisfies \(u=a\), \(u=b\) or \(u=0\) for almost every point of the mesh (for \(\nu =10^{-8}\) only five nodes are nonactive at the optimum). The matrices \(H^{k}\) are then close to zero, and the multigrid hierarchy struggles to capture changes at such small scale. Nevertheless, the multigrid algorithm remains a very efficient preconditioner for GMRES even in this challenging limit.

Table 5 Number of semismooth Newton iterations, of V-cycle iterations and of preconditioned GMRES iterations (in brackets). In the second row, the semismooth Newton method starts from a warm-up initial guess obtained through continuation

Figure 2 shows a sequence of optimal controls for different values of \(\beta \) with and without box-constraints. The optimal control for \(\beta =0\) and without box-constraints corresponds to the minimizer of the linear-quadratic OCP (5). We observe that \(L^1\) penalization indeed induces sparsity, since the optimal controls are more and more localized as \(\beta \) increases. Numerically we have verified that for sufficiently large \(\beta \), the optimal control is identically equal to zero, a property shown in [36].

Fig. 2
figure 2

From left to right: optimal control computed for \(\beta \in \left\{ 0,5\cdot 10^{-3},5\cdot 10^{-2}\right\} \) with (top row) and without (bottom row) box constraints: \(a=-50\), \(b=50\)

5 A Risk-Averse Optimal Control Problem Under Uncertainty

In this section we consider an instance of risk-averse OCPUU. This class of problems has recently drawn lot of attention since in engineering applications it is important to compute a control that minimizes the quantity of interest even in rare, but often troublesome, scenarios [2, 6, 43, 44]. As a risk-measure [45], we use the Conditional Value-At-Risk (CVaR) of confidence level \(\lambda \in (0,1)\),

$$\begin{aligned} \text {CVaR}_\lambda \left( X \right) := \mathbb {E}\left[ X \vert X\ge \text {VaR}_\lambda \left( X \right) \right] ,\quad \forall X\in L^1(\Omega ;\mathbb {R}), \end{aligned}$$

that is, the expected value of a quantity of interest X given that the latter is greater than or equal to its \(\lambda \)-quantile, here denoted by \(\text {VaR}_\lambda \left( X \right) \). Rockafellar and Uryasev [46] proved that \(\text {CVaR}_\lambda \left( X \right) \) admits the equivalent formulation

$$\begin{aligned} \text {CVaR}_\lambda \left( X \right) = \inf _{t\in \mathbb {R}}\left\{ t+\frac{1}{1-\lambda }\mathbb {E}\left[ (X-t)^+\right] \right\} , \end{aligned}$$

where \((\cdot )^+:=\max (0,\cdot )\), if the distribution of X does not have an atom at \(\text {VaR}_\lambda \left( X \right) \). In order to use tools from smooth optimization, we rely on a smoothing approach proposed in [2], which consists in replacing \((\cdot )^+\) with a smooth function \(g_\varepsilon \), \(\varepsilon \in \mathbb {R}^+\), such that \(g_\varepsilon \rightarrow (\cdot )^+\) in some functional norm as \(\varepsilon \rightarrow 0\). Specifically, we choose the \(C^2\)-differentiable approximation

$$\begin{aligned} g_\varepsilon (x)={\left\{ \begin{array}{ll} 0\quad &{}\text {if } x\le -\frac{\varepsilon }{2},\\ \frac{(x-\frac{3}{2})^3}{\varepsilon ^2}-\frac{(x-\frac{\epsilon }{2})^4}{2\varepsilon ^3}\quad &{}\text {if } x\in (-\frac{\varepsilon }{2},\frac{\varepsilon }{2}),\\ x\quad &{}\text {if }x\ge \frac{\varepsilon }{2}. \end{array}\right. } \end{aligned}$$

Then, the smoothed risk-averse OCPUU is

$$\begin{aligned} \begin{aligned}&\min _{u\in L^2(\mathcal {D}),t\in \mathbb {R}} t+\frac{1}{1-\lambda } \mathbb {E}\left[ g_\varepsilon \left( \frac{1}{2}\Vert y_\omega -y_d\Vert ^2_{L^2(\mathcal {D})}-t)\right) \right] +\frac{\nu }{2}\Vert u\Vert ^2_{L^2(\mathcal {D})},\\&\quad \text {subject to}\\&a_\omega (y_\omega ,v)=(u+f,v)\quad \forall v\in V,\ \mathbb {P}\text {-a.e. }\omega \in \Omega , \end{aligned} \end{aligned}$$
(31)

where \(\nu \in \mathbb {R}^+\) and \(\lambda \in [0,1)\). The well-posedness of (31), the differentiability of its objective functional, as well as bounds for the error introduced by replacing \((\cdot )^+\) with \(g_{\varepsilon }(\cdot )\), have been analyzed in [2]. Further, defining \(Q_\omega =\frac{1}{2}\Vert y_\omega -y_d\Vert ^2_{L^2(\mathcal {D})}-t\), the optimality conditions form the nonlinear system,

$$\begin{aligned} \begin{array}{r l r l} &{} a_\omega (v,p_\omega )-\frac{g^\prime _\varepsilon \left( Q_\omega \right) }{1-\lambda }(y_d-y_\omega ,v)=0,\quad &{}\forall v \in V,\ \mathbb {P}\text {-a-e. } \omega \in \Omega ,\\ &{}(\nu \ u -\mathbb {E}\left[ p_\omega \right] ,v)=0,\quad &{}\forall v\in L^2(\mathcal {D}),\\ &{}a_\omega (y_\omega ,v)-(u+f,v)=0,\quad &{}\forall v\in V,\quad \mathbb {P}\text {-a-e. } \omega \in \Omega ,\\ &{}1-\frac{1}{1-\lambda }\mathbb {E}\left[ g^\prime _\varepsilon \left( Q_\omega )\right) \right] =0.\quad &{} \end{array} \end{aligned}$$
(32)

Approximating V and \(\mathbb {E}\) with \(V_h\) and \({\widehat{{\mathbb {E}}}}\), and letting \({\widetilde{{\textbf{x}}}}=(\textbf{y},\textbf{u},\textbf{p},t)\), the finite-dimensional discretization of (32) correponds to the nonlinear system \({\widetilde{{\textbf{F}}}}({\widetilde{{\textbf{x}}}})=\textbf{0}\), where \({\widetilde{{\textbf{F}}}}:\mathbb {R}^{(2N+1)N_h+1}\rightarrow \mathbb {R}^{(2N+1)N_h+1}\),

$$\begin{aligned} {\widetilde{{\textbf{F}}}}({\widetilde{{\textbf{x}}}})=\begin{pmatrix} {\widetilde{{\textbf{F}}}}_1({\widetilde{{\textbf{x}}}})\\ {\widetilde{{\textbf{F}}}}_2({\widetilde{{\textbf{x}}}})\\ {\widetilde{{\textbf{F}}}}_3({\widetilde{{\textbf{x}}}})\\ \widetilde{F}_4({\widetilde{{\textbf{x}}}}) \end{pmatrix}=\begin{pmatrix} \widetilde{M}(\textbf{y}-I\textbf{y}_d)+A^\top \textbf{p}\\ \nu M \textbf{u}- M{\widehat{{\mathbb {E}}}}\left[ \textbf{p}\right] \\ A\textbf{y}- M(I\textbf{u}+\textbf{f})\\ 1-\frac{1}{1-\lambda }{\widehat{{\mathbb {E}}}}\left[ g^\prime _\varepsilon (Q_\omega )\right] \end{pmatrix}, \end{aligned}$$
(33)

with \(A=\text {diag}(A_1,\dots ,A_N)\), \(I=[I_{Nh},\dots ,I_{Nh}]\in \mathbb {R}^{N_h\times N_h N}\), \(I_h\) being the identity matrix, \(\textbf{y}_d\) is the discretization of \(y_d\), and

$$\begin{aligned} \widetilde{M}=\text {diag}\left( \frac{g^\prime _\varepsilon (Q_{\omega _1})}{1-\lambda }M,\dots ,\frac{g^\prime _\varepsilon (Q_{\omega _N})}{1-\lambda }M\right) , \text { with }Q_{\omega _j}:=\frac{1}{2}(\textbf{y}_j-\textbf{y}_d)^\top M(\textbf{y}_j-\textbf{y}_d)-t,\end{aligned}$$

for \(j=1,\dots ,N\).

A possible approach to solve (33) is to use a Newton method, which given \(\textbf{x}^k=(\textbf{y}^k,\textbf{u}^k,\textbf{p}^k,t^k)\) computes the corrections \({\widetilde{{\textbf{dx}}}}^k=(\textbf{dy}^k,\textbf{du}^k,\textbf{dp}^k,dt^k)\) solution of \({\widetilde{{\textbf{J}}}}^k{\widetilde{{\textbf{dx}}}}^k=-{\widetilde{{\textbf{F}}}}({\widetilde{{\textbf{x}}}}^k)\), where

$$\begin{aligned} {\widetilde{{\textbf{J}}}}^k:=\begin{pmatrix} C_1(\textbf{y}_1^k,t^k) &{} &{} &{} &{} A_1^\top &{} &{} &{}-\textbf{v}^k_1\\ &{} \ddots &{} &{} &{} &{}\ddots &{} &{}\vdots \\ &{} &{} C_N(\textbf{y}_N^k,t^k) &{} &{} &{} &{} A_N^\top &{} -\textbf{v}^k_N\\ &{} &{} &{} \nu M &{} -\zeta _1 M &{}\dots &{} -\zeta _N M &{} \\ A_1 &{} &{} &{} -M &{} &{} &{} &{} \\ &{} \ddots &{} &{}\vdots &{} &{} &{} &{} \\ &{} &{} A_N &{} -M &{} &{} &{} &{}\\ -\zeta _1\left( \textbf{v}_1^k\right) ^\top &{} \ddots &{} -\zeta _N\left( \textbf{v}^k_N\right) ^\top &{} &{} &{} &{} &{} \frac{{\widehat{{\mathbb {E}}}}\left[ g_\varepsilon ^{\prime \prime }(Q_\omega ^k)\right] }{1-\lambda } \end{pmatrix}, \end{aligned}$$

with

$$\begin{aligned} Q_{\omega _i}^k&:=\frac{1}{2}(\textbf{y}_i^k-\textbf{y}_d)^\top M(\textbf{y}_i^k-\textbf{y}_d) -t^k,\nonumber \\ C_i(\textbf{y}_i^k,t^k)&:=\frac{1}{1-\lambda }\left( g^\prime _\varepsilon (Q_{\omega _i}^k)M +g_\varepsilon ^{\prime \prime }(Q_{\omega _i}^k)M(\textbf{y}_i^k-\textbf{y}_d)(\textbf{y}_i^k-\textbf{y}_d)^\top M\right) ,\\ \textbf{v}^k_i&:=\frac{1}{1-\lambda }g_\varepsilon ^{\prime \prime }(Q_{\omega _i}^k)M(\textbf{y}_i^k-\textbf{y}_d)\nonumber , \end{aligned}$$
(34)

for \(i=1,\dots , N\). Unfortunately, \({\widetilde{{\textbf{J}}}}^k\) can be singular away from the optimum, in particular whenever \({\widehat{{\mathbb {E}}}}\left[ g_\varepsilon ^{\prime \prime }(Q_\omega ^k)\right] =0\) which implies

$$\begin{aligned} g_\varepsilon ^{\prime \prime }\left( \frac{1}{2}(\textbf{y}_j^k-\textbf{y}_d)^\top M(\textbf{y}_j^k-\textbf{y}_d) -t^k\right) =0,\ \forall j=1,\dots ,N, \end{aligned}$$
(35)

which is not unlikely for small \(\varepsilon \) since \(\text {supp}(g_\varepsilon ^{\prime \prime })=(-\frac{\varepsilon }{2},\frac{\varepsilon }{2})\). Splitting strategies have been proposed (e.g. [47] in a reduced approach), in which whenever (35) is satisfied, an intermediate value of t is computed by solving \(\widetilde{F}_4(t;\textbf{y},\textbf{u},\textbf{p})=0\) so to violate (35). In the next section, we discuss a similar splitting approach. To speed up the convergence of the outer nonlinear algorithm, we use a preconditioned Newton method based on nonlinear elimination [48]. At each iteration we will need to invert saddle-point matrices like (2), possibly several times. To do so, we rely on the collective multigrid algorithm.

5.1 Nonlinear Preconditioned Newton Method

Nonlinear elimination is a nonlinear preconditioning technique based on the identification of variables and equations of \(\textbf{F}\) (e.g. strong nonlinearities) that slow down the convergence of Newton method. These components are then eliminated through the solution of a local nonlinear problem at every step of an outer Newton. This elimination step provides a better initial guess for the outer iteration, so that a faster convergence is achieved [48, 49].

In light of the possible singularity of \({\widetilde{{\textbf{J}}}}\), we split the discretized variables \({\widetilde{{\textbf{x}}}}\) into \({\widetilde{{\textbf{x}}}}=(\textbf{x},t)\), and we aim to eliminate the variables \(\textbf{x}\) to obtain a scalar nonlinear equation only for t. To do so, we partition (32) as

$$\begin{aligned} {\widetilde{{\textbf{F}}}}\begin{pmatrix} \textbf{x}\\ t \end{pmatrix}=\begin{pmatrix} \textbf{F}_1(\textbf{x},t)\\ F_2(\textbf{x},t) \end{pmatrix}=\begin{pmatrix} \textbf{0}\\ 0 \end{pmatrix}, \end{aligned}$$
(36)

where \(\textbf{F}_1=({\widetilde{{\textbf{F}}}}_1(\textbf{x},t),{\widetilde{{\textbf{F}}}}_2(\textbf{x},t),{\widetilde{{\textbf{F}}}}_3(\textbf{x},t))\) and \(F_2(\textbf{x},t)=\widetilde{F}_4(\textbf{x},t)\). Similarly, \({\widetilde{{\textbf{J}}}}\) is partitioned into

$$\begin{aligned}{\widetilde{{\textbf{J}}}}=\begin{pmatrix} \textbf{J}_{1,1} &{} \textbf{J}_{1,2}\\ \textbf{J}_{2,1} &{} J_{2,2} \end{pmatrix}\end{aligned}$$

whose blocks have dimensions \(\textbf{J}_{1,1}\in \mathbb {R}^{(2N+1)N_h\times (2N+1)N_h}\), \(\textbf{J}_{1,2}\in \mathbb {R}^{(2N+1)N_h\times 1}\), \(\textbf{J}_{2,1}\in \mathbb {R}^{1\times (2N+1)N_h}\), and \(J_{2,2}\in \mathbb {R}\). Notice that \(\textbf{J}_{1,1}\) is always nonsingular, while \(\textbf{J}_{2,1}\), \(\textbf{J}_{1,2}\) and \(J_{2,2}\) are identically zero if (35) is verified.

Thus \(\textbf{F}_1\) allows us to define an implicit map \(h:\mathbb {R}\rightarrow \mathbb {R}^{(2N+1)N_h}\), such that \(\textbf{F}_1(h(t),t)=0\), so that the first set of nonlinear equations in (36) are satisfied. We are then left to solve the nonlinear scalar equation

$$\begin{aligned} F(t)=0,\quad \text {where}\quad F(t):=F_2(h(t),t). \end{aligned}$$
(37)

To do so using the Newton method, we need the derivative of F(t) evaluated at \(t=t^k\) which, using implicit differentiation, can be computed as

$$\begin{aligned} F^\prime (t^k)=J_{2,2}(h(t^k),t^k)-\textbf{J}_{2,1}(h(t^k),t^k)\left( \textbf{J}_{1,1}(h(t^k),t^k) \right) ^{-1}\textbf{J}_{1,2}(h(t^k),t^k). \end{aligned}$$

The nonlinear preconditioned Newton method is described in Algorithm 3, and consists in solving (37) with Newton method. However, to overcome the possible singularity of \(J^k_{2,2}\), \(\textbf{J}^k_{1,2}\) and \(\textbf{J}^k_{2,1}\), we check at each iteration k if (35) is satisfied, and in the affirmative case we update \(\textbf{x}^k\) by solving \(\textbf{F}_1(\textbf{x}^{k+1},t^k)=0\) using Newton method, and update \(t^k\) by solving \(F_2(\textbf{x}^k,t^{k+1})=0\). Notice further, that each iteration of the backtracking line-search requires to solve \(F_1(h(t),t)=0\) using Newton method, thus additional linear systems with matrix \(\textbf{J}_{1,1}\) must be solved.

We report that we also tried to eliminate t by computing the map l such that \(F_2(\textbf{x},l(\textbf{x}))=0\), while iterating on the variable \(\textbf{x}\). This has the advantage that l can be evaluted very cheaply, being a scalar equation. However, we needed many more iterations both of the outer Newton method, and consequently of the inner linear solver. Thus, according to our experience, this second approach was less efficient and appealing.

Algorithm 3
figure c

Nonlinear preconditioned Newton method to solve \({\widetilde{{\textbf{F}}}}({\widetilde{{\textbf{x}}}})=0\).

5.2 Numerical Experiments

In this section we report numerical tests to asses the performance of the preconditioned Newton algorithm to solve (37), and of the collective multigrid algorithm to invert the matrix \(\textbf{J}_{1,1}\). We consider the random PDE-constraint (22) with the random diffusion coefficient (23). Table 6 reports the number of outer and inner Newton iterations, and the average number of V-cycle iterations and of preconditioned GMRES iterations to solve the linear systems at each (inner/outer) Newton iterations. The outer Newton iteration is stopped when \(\vert F(t^k)\vert \le 10^{-6}\), the inner Newton method to compute \(h(\cdot )\) is stopped when \(\max \left( \Vert \textbf{F}_{1,1}(\textbf{x}^k;t)\Vert _2/\Vert \textbf{F}_{1,1}(\textbf{x}^0;t)\Vert _2,\Vert \textbf{F}_{1,1}(\textbf{x}^k;t)\Vert _2\right) \le 10^{-8}\), and the linear solvers are stopped when the relative (unpreconditioned) residual is smaller than \(10^{-9}\).

In Table 6, the number of outer Newton iterations is stable, while the number of inner Newton iterations varies between five and fifteen iterations per outer iteration. This is essentially due to how difficult it is to compute the nonlinear map h(t) by solving \(\textbf{F}_1(\textbf{x};t)=0\) in line (5), (8) and (11) of Algorithm 3. The average number of inner linear solver iterations is quite stable across all experiments. The most challenging case is the limit \(\varepsilon \rightarrow 0\) in which we used the solution to the optimization problem as a warmed-up initial guess for the next smaller value of \(\varepsilon \). Further, we emphasize that the top left blocks of \(\textbf{J}_{1,1}\) involve the matrices \(C_i(\textbf{y}^k_i,t^k)\) (see (34)) which contain a dense low-rank term if \(g_\varepsilon ^{\prime \prime }(Q^k_{\omega _i})\ne 0\). As \(\varepsilon \rightarrow 0\), \(g_\varepsilon ^{\prime \prime }(\cdot )\) tends to a Dirac delta, so the dense term become dominant. Multigrid methods based on pointwise relaxations are expected to be not very efficient for these matrices which may not be diagonally dominant. The standard V-cycle algorithm indeed suffers, however the Krylov acceleration performs better as it handles these low-rank perturbation with smaller effort. For \(\varepsilon =10^{-4}\), we sometimes noticed that the GMRES residual stagnates after 20/30 iterations around \(10^{-7}/10^{-8}\), due to a loss of orthogonality in the Krylov subspace, and thus resulting in higher number of iterations. We allowed a maximum number of 80 iterations per linear system.

Table 6 For each numerical experiment, we report from the left to the right: the number of outer preconditioned Newton iterations, the total number of inner Newton iterations, the averaged number of V-cycle iterations and the averaged number of preconditioned GMRES iterations

Figure 3 compares the two optimal controls obtained minimizing either \(\mathbb {E}\left[ Q(y_\omega )\right] \) or \(\text {CVaR}_{0.99}\left[ Q(y_\omega )\right] \), and the cumulative distribution functions of \(Q(y_{\omega _j})\) computed on 8000 out-of-sample realizations. The risk-averse control indeed minimizes the risk of having large values of \(Q(y_\omega )\). The CVaR of level \(\lambda =0.99\) is respectively \(\text {CVaR}_{0.99}\left( Q(y_{\omega })\right) =2.79\) for the risk-neutral control and \(\text {CVaR}_{0.99}\left( Q(y_{\omega })\right) =0.90\) for the risk-averse control.

Fig. 3
figure 3

Solution of the linear-quadratic OCP (top-left), solution of the smoothed risk-averse OCP with \(\lambda =0.99\) (top-right), and cumulative distribution function of the quantity of interest for the controls computed with \(\lambda \in \left\{ 0,0.5,0.95,0.99\right\} \)

6 Conclusion

We have presented a multigrid method to solve the large saddle point linear systems that typically arise in full-space approaches to solve OCPUU. We further derived a detailed convergence analysis that fully characterizes the spectrum of the two-level iteration matrix. The algorithm has been tested as an iterative solver and as a preconditioner on three test cases: a linear-quadratic OCPUU, a nonsmooth OCPUU, and a risk-averse nonlinear OCPUU. Overall, the multigrid method shows very good performances and robustness with respect to the several parameters of the problems considered.