Optimal Control of the Principal Coefficient in a Scalar Wave Equation


We consider optimal control of the scalar wave equation where the control enters as a coefficient in the principal part. Adding a total variation penalty allows showing existence of optimal controls, which requires continuity results for the coefficient-to-solution mapping for discontinuous coefficients. We additionally consider a so-called multi-bang penalty that promotes controls taking on values pointwise almost everywhere from a specified discrete set. Under additional assumptions on the data, we derive an improved regularity result for the state, leading to optimality conditions that can be interpreted in an appropriate pointwise fashion. The numerical solution makes use of a stabilized finite element method and a nonlinear primal–dual proximal splitting algorithm.


This work is concerned with an optimal control problem for the scalar wave equation where the control enters as the spatially varying coefficient in the principal part. Informally, we consider the problem

$$\begin{aligned} \left\{ \begin{aligned}&\min _{u,y}\frac{1}{2}\Vert By-y_d\Vert _{{\mathcal {O}}}^2+{\mathcal {R}}(u)\\&\text {s.t.}\quad y_{tt} - {\mathrm {div}}(u\nabla y) = f,\quad y(0)=y_0, \partial _t y(0) = y_1,\\&\qquad {\underline{u}} \le u \le {\overline{u}} \quad \text {almost everywhere (a.e.)}, \end{aligned} \right. \end{aligned}$$

where \(y_d\) is a given (desired or observed) state, B is a bounded linear observation operator mapping to the observation space \({\mathcal {O}}\), \(\mathcal R\) is a regularization term, \(0<{\underline{u}} < {\overline{u}}\) are constants, and f, \(y_0\), and \(y_1\) (as well as boundary conditions) are given suitably. A precise statement is deferred to Sect. 2. Such problems occur, e.g., in acoustic tomography for medical imaging [1] and non-destructive testing [2] as well as in seismic inversion [3]. In the latter, the goal is the determination of a “velocity model” (as described by the coefficient u) of the underground in a region of interest from recordings (“seismograms”, modeled by \(y_d\)) of reflected pressure waves generated by sources on or near the surface (entering the equation via f, \(y_0\), \(y_1\), or inhomogeneous boundary conditions). If the region contains multiple different materials like rock, oil, and gas, the velocity model changes rapidly or may even have jumps between material interfaces.

In the stationary case, the question of existence of solutions to problem (1) under only pointwise constraints and regularization has received a tremendous amount of attention. However, it was answered in the negative in [4]; this and subsequent investigations led to the concept of H-convergence and, more generally, to homogenization theory; see, e.g., [5,6,7,8,9]. The use of regularization terms or constraints involving higher-order differential operators would certainly guarantee existence but contradicts the goal of allowing piecewise continuous controls u. Such considerations suggest the introduction of total variation regularization in addition to pointwise constraints. In this case, existence can be argued. However, this leads to difficulties in deriving necessary optimality conditions since the sum rule of convex analysis can only be applied in the \(L^\infty (\Omega )\) topology, which would lead to (generalized) derivatives that do not admit a pointwise representation. This difficulty can be circumvented by replacing the pointwise constraints by a (differentiable approximation of a) cutoff function applied to the coefficient in the equation and by using improved regularity results for the optimal state that allow extending the Fréchet derivative of the tracking term from \(L^\infty (\Omega )\) to \(L^s(\Omega )\) for \(s<\infty \) sufficiently large. Together, this allows obtaining derivatives and subgradients in \(L^r(\Omega )\) for some \(r>1\), which can be characterized in a suitable pointwise manner. This was carried out in [10], which considered for \({\mathcal {R}}\) a combination of total variation and multi-bang regularization; the latter is a convex pointwise penalty that promotes controls which take values from a prescribed discrete set (e.g., corresponding to different materials such as rock, oil, and gas), see also [11,12,13].

In the current work, we extend this approach to optimal control and identification of discontinuous coefficients in scalar wave equations by deriving under additional (natural) assumptions on the data the adapted higher regularity results for the wave equation based on elliptic maximal regularity theory [14]; see Assumption 3.1 and Proposition 3.6 below. We also address a suitable discretization of the problem using a stabilized finite element method [15] and its solution by a nonlinear primal–dual proximal splitting method [16,17,18].

Let us briefly comment on related literature. As there is a vast body of work on control and inverse problems for the wave equation, we focus here specifically on the identification of discontinuous (and, in particular, piecewise constant) coefficients. This problem has attracted strong interest over the last few decades, mainly due to its relevance in seismic inversion. Classical works are mainly concerned with the one-dimensional setting – as a model for seismic inversion in stratified or layered media – which allows making use of integral transforms to derive explicit “layer-stripping” formulas; see, e.g., [19,20,21,22]. Regarding the numerous works on wave speed identification in the multidimensional wave equation for seismic inversion, we only mention exemplarily [23,24,25]; see also further literature cited there. The use of total variation penalties for recovering a piecewise constant wave speed in multiple dimensions has been proposed in, e.g., [26,27,28,29,30], although the earlier works employed a smooth approximation of the total variation to allow the numerical solution by standard approaches for nonlinear PDE-constrained optimization. Finally, joint multi-bang and total variation regularization of linear inverse problems and its numerical solution by a primal–dual proximal splitting methods were considered in [31]. We also mention that multi-bang control is related to (but different from) switching controls, where at each instant in time, one and only one from a given set of time-dependent controls should be active; see, e.g., [32].

This work is organized as follows. In the next Sect. 2, we give a formal statement of the optimal control problem (1) and recall the relevant definitions and properties of the functional. We then derive in Sect. 3 the results on regularity, stability, and a priori estimates for solutions of the state equation that will be needed in the rest of the paper. In particular, in Proposition 3.6 we show a Groeger-type maximal regularity result for the wave equation under additional assumptions on the data. Section 4 is devoted to existence and first-order necessary optimality conditions for optimal controls, where we use the mentioned maximal regularity result to show that the latter can be interpreted in a pointwise fashion. We then discuss the numerical computation of solutions using a stabilized finite element discretization (see Sect. 5.1) together with a nonlinear primal–dual proximal splitting method (see Sect. 5.2). This approach is illustrated in Sect. 6 for two examples: a transmission setup motivated by acoustic tomography and a reflection setup modeling seismic tomography.

Problem Statement

Let \(\Omega \subset {\mathbb {R}}^d\), \(d \in \{2,3\}\), be a bounded domain with \(C^{2,1}\) regular boundary \(\partial \Omega \) and outer normal \(\nu \). For brevity, we introduce the notation \(H:=L^2(\Omega )\) and \(V:=H^1(\Omega )\) and set \(I:=(0,T)\). Then we consider for \(f\in L^2(I,V)\), \(y_0\in V\), and \(y_1\in H\) the weak solution \(y\in C(\overline{I},V)\cap C^1({\overline{I}},H)\) to

$$\begin{aligned} \left\{ \begin{aligned} \partial _{tt} y - {\mathrm {div}}(u\nabla y)&= f&\text { in } Q := (0,T) \times \Omega , \\ {\partial _\nu y}&= 0&\text { on } \Sigma := (0,T) \times \partial \Omega , \\ y(0)&= y_0,\quad {\partial _t} y(0) = y_1,&\text { on }\Omega . \end{aligned}\right. \end{aligned}$$

This choice of Neumann boundary conditions corresponds, e.g., for acoustic waves to the situation of reflection at a sound-hard obstacle and for elastic waves to the absence of external forces at the boundary (which is a natural setting for seismic imaging via interior sources). We will discuss existence and regularity of solutions to (2) in the following Sect. 3.

The salient point is of course the coefficient u in the principal part, which we want to control on an open subset \(\omega _c\subseteq \Omega \), which is assumed to have a \(C^{2,1}\) regular boundary. For constants \({\underline{u}}, \overline{u}\) with \(0< {\underline{u}}< \overline{u} < \infty \) we define the set of admissible coefficients

$$\begin{aligned} {\hat{U}} = \left\{ u \in L^\infty (\Omega ):{\underline{u}} \le u(x) \le \overline{u}\quad \text {for a.e.}~x\in \Omega \right\} \end{aligned}$$

and pick a reference coefficient \({\hat{u}}\in {\hat{U}}\). To map a control u defined on \(\omega _c\) to a coefficient defined on \(\Omega \), we introduce the affine bounded extension operator

$$\begin{aligned} {\hat{E}}: L^2(\omega _c)\rightarrow L^2(\Omega ),\qquad [{\hat{E}}u](x) := {\left\{ \begin{array}{ll} {\hat{u}}(x) + u(x) &{} \text {for }x\in \omega _c,\\ {\hat{u}}(x) &{} \text {for }x\in \Omega \setminus \omega _c. \end{array}\right. } \end{aligned}$$

The set of controls that can be extended to admissible coefficients is then given by

$$\begin{aligned} U = \left\{ u \in L^\infty (\omega _c):u_{\min } \le u(x) \le u_{\max } \quad \text {for a.e.}~x\in \omega _c\right\} , \end{aligned}$$

where \(u_{\min }<u_{\max }\) are such that \({\underline{u}} \le \inf _{x\in \omega _c} {\hat{u}}(x) + u_{\min } \le \sup _{x\in \omega _c} {\hat{u}}(x) + u_{\max } \le \overline{u}\). In particular, for \({\hat{u}} \equiv {\underline{u}}\), we have \(u_{\min }=0\) and \(u_{\max } = \overline{u}-{\underline{u}}\).

Moreover, we introduce the observation space \({\mathcal {O}}\) which is assumed to be a separable Hilbert space as well as a linear and bounded observation operator \(B\in {\mathbb {L}}(L^2(Q),{\mathcal {O}})\) with adjoint \(B^*\in {\mathbb {L}}({\mathcal {O}}, L^2(Q))\).

We then consider the optimal control problem

$$\begin{aligned} \min _{u\in BV(\omega _c)\cap U}\frac{1}{2}\Vert By(\hat{E}u)-y_d\Vert _{{\mathcal {O}}}^2+\alpha G(u)+\beta \mathrm {TV}(u), \end{aligned}$$

where y(u) is a weak solution to (2), G is the multi-bang penalty from [11, 12], \(\mathrm {TV}\) denotes the total variation, and \(\alpha \) and \(\beta \) are positive constants. In the remainder of this section, we recall the definitions and properties of the total variation and the multi-bang penalty relevant to the current work.

Total variation We recall, e.g., from [33,34,35] that the space \(BV(\omega _c)\) is given by those functions \(v\in L^1(\omega _c)\) for which the distributional derivative Dv is a Radon measure, i.e.,

$$\begin{aligned} BV(\omega _c) = \left\{ v\in L^1(\omega _c): \Vert Dv \Vert _{{\mathcal {M}}(\omega _c)} < \infty \right\} . \end{aligned}$$

The total variation of a function \(v\in BV(\omega _c)\) is then given by

$$\begin{aligned} \mathrm {TV}(v) := \Vert Dv \Vert _{{\mathcal {M}}(\omega _c)} = \int _{\omega _c} \mathrm {d}|Dv|_2, \end{aligned}$$

i.e., the total variation (in the sense of measure theory) of the vector measure \(Dv\in {\mathcal {M}}(\omega _c;{\mathbb {R}}^d)=C_0(\omega _c;{\mathbb {R}}^d)^*\). Here, \(|\cdot |_2\) denotes the Euclidean norm on \({\mathbb {R}}^d\); we thus consider in this work the isotropic total variation. For \(v\in L^1(\omega _c)\setminus BV(\omega _c)\), we set \(\mathrm {TV}(v)=\infty \). It follows that \(BV(\omega _c)\) embeds into \(L^r(\omega _c)\) continuously for every \(r\in [1,\frac{d}{d-1}]\) and compactly if \(r < \frac{d}{d-1}\); see, e.g., [33, Cor. 3.49 together with Prop. 3.21]. In addition, the total variation is lower semi-continuous with respect to strong convergence in \(L^1(\omega _c)\), i.e., if \(\{u_n\}_{n\in {\mathbb {N}}}\subset BV(\omega _c)\) and \(u_n\rightarrow u\) in \(L^1(\omega _c)\), we have that

$$\begin{aligned} \mathrm {TV}(u)\le \liminf _{n\rightarrow \infty } \mathrm {TV}(u_n), \end{aligned}$$

see, e.g., [35, Thm. 5.2.1]. Note that this does not imply that \(\mathrm {TV}(u)<\infty \) and hence that \(u\in BV(\omega _c)\) unless \(\{\mathrm {TV}(u_n)\}_{n\in {\mathbb {N}}}\) has a bounded subsequence. From (7), we also deduce that the convex extended real-valued functional \(\mathrm {TV}:L^p(\omega _c)\rightarrow {\mathbb {R}}\cup \{\infty \}\) is weakly lower semi-continuous for any \(p\in [1,\infty ]\).

Multi-bang penalty Let \(u_{\min } \le u_1<\dots < u_m\le u_{\max }\) be a given set of desired coefficient values. The multi-bang penalty G is then defined similar to [12], where we have to replace the box constraints \(u(x)\in [u_1,u_m]\) by a linear growth to ensure that G is finite on \(L^r(\omega _c)\), \(r<\infty \). For simplicity, we assume in the following that \(u_1 = u_{\min } = 0\) and \(u_m = u_{\max } = {\overline{u}} - {\underline{u}}\) (i.e., \({\hat{u}} = {\underline{u}}\)) and define

$$\begin{aligned} G:L^1(\omega _c)\rightarrow {\mathbb {R}},\qquad G(u) = \int _{\omega _c} g(u(x))\,\mathrm {d}x, \end{aligned}$$

where \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is given by

$$\begin{aligned} g(t) = {\left\{ \begin{array}{ll} -u_m t &{} t \le u_1,\\ \frac{1}{2} \left( (u_{i}+u_{i+1})t - u_iu_{i+1}\right) &{} t \in [u_i,u_{i+1}],\quad 1\le i <m,\\ u_m t - \frac{1}{2} u_m^2 &{} t \ge u_m. \end{array}\right. } \end{aligned}$$

This definition can be motivated via the convex envelope of \(\delta _{[u_1,u_d]}(t) + \frac{1}{2} |t|^2\) (where \(\delta \) denotes the indicator function in the sense of convex analysis), see [12]; note however that here (as in [10]) g is defined to be finite for every \(t\in {\mathbb {R}}\), while the convex envelope is only finite for \(t\in [u_1,u_2]\). We also remark that for \(m=2\), this reduces in the current setting to the well-known sparsity penalty (i.e., \(G(u) = \Vert u \Vert _{L^1(\omega _c)}\) for any \(u\in U\)).

It can be verified easily that g is continuous, convex, and linearly bounded from above and below, i.e.,

$$\begin{aligned} \frac{1}{2} u_2 |t| \le g(t) \le u_m |t| \qquad \text {for all }t\in {\mathbb {R}}. \end{aligned}$$

Since g is finite (and hence proper), convex, and continuous, the corresponding integral operator \(G:L^r(\omega _c)\rightarrow {\mathbb {R}}\) is finite, convex, and continuous (and hence a fortiori weakly lower semi-continuous) for any \(r\in [1,\infty ]\), see, e.g., [36, Prop. 2.53]. Also, the properties of g imply that

  1. (G1)

    \(G(v) > G(0)=0\) for all \(v\in L^1(\omega _c)\setminus \{0\}\),

  2. (G2)

    \(\frac{1}{2} u_2 \Vert v \Vert _{L^1(\Omega )}\le G(v) \le u_m \Vert v \Vert _{L^1(\omega _c)}\) for all \(v\in L^1(\omega _c)\).

The State Equation

We first consider the state equation for a fixed coefficient \(u\in {\hat{U}}\) (i.e., defined and uniformly bounded on the full domain \(\Omega \) and satisfying \({\underline{u}}\le u \le \overline{u}\) almost everywhere). For given \(u\in {\hat{U}}\), \(f\in L^2(I,H)\), \(y_0\in V\), and \(y_1\in H\), we call \(y=y(u)\) a (weak) solution to (2) if \(y\in W:= L^2(I,V) \cap W^{1,2}(I,H)\) and

$$\begin{aligned} \left\{ \begin{aligned} \int _0^T - (\partial _t y, \partial _t v)_H + (u \nabla y(t), \nabla v(t))_{{\mathbb {L}}^2(\Omega )} dt&= \int _0^T (f(t), v(t))_H dt + (y_1,v(0))_H,\\ y(0)&= y_0, \end{aligned} \right. \end{aligned}$$

for all \(v\in W\) with \(v(T) = 0\). We then have the following existence and natural regularity result.

Lemma 3.1

For every \(u \in {\hat{U}}\) and \((f, y_0, y_1) \in L^2(I,H) \times V \times H\), there exists a unique (weak) solution \(y = y(u) \in Z := C({\overline{V}}) \cap C^1({\overline{H}})\) to (2) satisfying

$$\begin{aligned} \Vert y \Vert _{C({\overline{V}})} + \Vert \partial _{t} y \Vert _{C({\overline{H}})} + \Vert \partial _{tt} y \Vert _{L^2({I}, V^*)} \le C_1(\Vert f \Vert _{L^2(I,H)} + \Vert y_0 \Vert _V + \Vert y_1 \Vert _H) \end{aligned}$$

for a constant \(C_1\) independent of \((f,y_0, y_1) \in L^2(I,H) \times V \times H\) and \(u \in {\hat{U}}\).


Except for the estimate on \(\Vert \partial _{tt} y \Vert _{L^2({I}, V^*)}\), the claim follows from [37, Theorem 3.8.2, page 275], where we observe that due to our assumption on \(u\in {\hat{U}}\), the energy is coercive with respect to the seminorm in V; see also [23, Theorem 2.4.5]. The constant \(C_1\) depends on \({\underline{u}}\) and \({\overline{u}}\), but is otherwise independent of \(u\in {\hat{U}}\).

To verify the missing estimate, we use from (10) that

$$\begin{aligned} \Vert y \Vert _{L^2({I}, V)}\le C_1(\Vert f \Vert _{L^2(I,H)} + \Vert y_0 \Vert _V + \Vert y_1 \Vert _H). \end{aligned}$$

Since \(u \in {\hat{U}}\), we deduce that

$$\begin{aligned} \Vert {\mathrm {div}}(u\nabla y) \Vert _{L^2({I}, V^*)}\le C_1(\Vert f \Vert _{L^2(I,H)} + \Vert y_0 \Vert _V + \Vert y_1 \Vert _H). \end{aligned}$$

We further deduce from the state equation that

$$\begin{aligned} \Vert \partial _{tt} y \Vert _{L^2(I,V^*)} \le {\tilde{C}}_1(\Vert f \Vert _{L^2(I,H)} + \Vert y_0 \Vert _V + \Vert y_1 \Vert _H), \end{aligned}$$

with \({\tilde{C}}_1 = C_1+1\). \(\square \)

By the change of variables \(t\mapsto T-t\), we can also apply Lemma 3.1 to the dual problem

$$\begin{aligned} \left\{ \begin{aligned} \int _0^T - (\partial _t \varphi , \partial _t v)_H + (u \nabla \varphi (t), \nabla v(t))_{{\mathbb {L}}^2(\Omega )} dt&= \int _0^T (g(t), v(t))_H dt + (\varphi _1(T),v(T))_H,\\ \varphi (T)&= \varphi _0 \text { in } V, \end{aligned} \right. \end{aligned}$$

for any \(g\in L^2(I,H)\), \(\varphi _0\in V\), \(\varphi _1 \in H\), and any \(v\in W\) with \(v(0)=0\).

Corollary 3.2

For every \(u\in {\hat{U}}\) and \(g\in L^2(I,H)\), \(\varphi _0\in V\), and \(\varphi _1 \in H\), there exists a unique solution \(\varphi \in Z\) to (11) satisfying

$$\begin{aligned} \Vert \varphi \Vert _{C({\overline{I}},V)} + \Vert \partial _t \varphi \Vert _{C({\overline{I}},H)}+\Vert \partial _{tt} \varphi \Vert _{L^2({I}, V^*)} \le C_1 (\Vert g \Vert _{L^2(I,H)} + \Vert \varphi _0 \Vert _V + \Vert \varphi _1 \Vert _{H}). \end{aligned}$$

Using this result, we can apply an Aubin–Nitsche trick or duality argument to show Lipschitz continuity of \(u\mapsto y(u) \in L^2(I,H)\), which we will need to show differentiability of the tracking term later.

Lemma 3.3

There exists a constant \(L>0\) such that the mapping \(u\mapsto y(u)\) satisfies

$$\begin{aligned} \Vert y(u_1)-y(u_2) \Vert _{L^2(I,H)}\le L \Vert u_1-u_2 \Vert _{L^\infty (\Omega )} \quad \text {for all }u_1,u_2\in {\hat{U}}. \end{aligned}$$


Let \(u_1,u_2\in {\hat{U}}\) be arbitrary and set \(\delta u := u_1-u_2\) and \(\delta y:= y(u_1)-y(u_2)\). Subtracting the weak equations for \(y(u_1)\) and \(y(u_2)\), we have that \(\delta y \in Z\) satisfies \(\delta y(0) = 0\) and

$$\begin{aligned} \int _0^T - (\partial _t \delta y, \partial _t v)_H + (u_1\nabla \delta y,\nabla \delta v)_{{\mathbb {L}}^2(\Omega )} dt = -\int _0^T (\delta u\nabla y(u_2), \nabla v)_{{\mathbb {L}}^2(\Omega )} dt\nonumber \\ \end{aligned}$$

for all \(v\in W\) with \(v(T) = 0\).

Let now \(g\in L^2(I,H)\) be arbitrary and consider the corresponding solution \(\varphi _g\in W\) of (11) for \(u=u_1\), \(g_0=0\), and \(g_1=0\). Noting that \(v=\varphi _g\) is a valid test function for (12) and \(w=\delta y\) is a valid test function for (11), we obtain that

$$\begin{aligned} \begin{aligned} (\delta y, g)_{L^2(I,H)}&= -(\partial _t \delta y,\partial _t \varphi _g)_{L^2(I,H)} + (u_1\nabla \delta y,\nabla \delta \varphi _g)_{L^2(I,{\mathbb {L}}^2(\Omega ))} \\&= (\delta u \nabla y(u_2),\nabla \varphi _g)_{L^2(I,{\mathbb {L}}^2(\Omega ))}. \end{aligned} \end{aligned}$$

Using that \(\delta u\in L^\infty (\Omega )\) together with Lemma 3.1 and Corollary 3.2, this implies that

$$\begin{aligned} \begin{aligned} (\delta y, g)_{L^2(I,H)}&\le \Vert \delta u \Vert _{L^\infty (\Omega )} \Vert y(u_2) \Vert _{L^2(I,V)}\Vert \varphi _g \Vert _{L^2(I,V)}\\&\le \Vert \delta u \Vert _{L^\infty (\Omega )} C_1(\Vert f \Vert _{L^2(I,H)}+\Vert y_0 \Vert _V+\Vert y_1 \Vert _H)\Vert g \Vert _{L^2(I,H)} \end{aligned} \end{aligned}$$

for all \(g\in L^2(I,H)\). Since \(L^2(I,H)\) is a Hilbert spaces, taking the supremum over all \(g\in L^2(I,H)\) yields the claim. \(\square \)

In stronger norms, we only have the following weak continuity result, which will be used repeatedly.

Lemma 3.4

Let \(\{u_n\}_{n\in {\mathbb {N}}}\subset {\hat{U}}\) be a sequence with \(u_n\rightarrow u\) in \(L^r(\Omega )\) for some \(r\in [1,\infty )\). Then \(u\in {\hat{U}}\) and \(y(u_n)\rightharpoonup y(u)\) in \(L^2(I,V)\cap W^{1,2}(I,H)\cap W^{2,2}(I,V^*)\). Furthermore, \(y(u_n)\rightharpoonup y(u)\) in H pointwise for all \(t\in [0,T]\).


The first assertion follows from the fact that \({\hat{U}}\) is closed in \(L^r(\Omega )\). From \(u_n\in {\hat{U}}\) and Lemma 3.1, the corresponding sequence \(\{y(u_n)\}_{n\in {\mathbb {N}}}\) of solutions to (9) is well-defined and bounded in \(L^2(I,V)\cap W^{1,2}(I,H)\cap W^{2,2}(I,V^*)\). By passing to successive subsequences (which we do not distinguish), we thus obtain that

$$\begin{aligned} y_n \rightharpoonup {\bar{y}} \quad \text { weakly in }\quad L^2(I,V) \cap W^{1,2}(I,H)\cap W^{2,2}(I,V^*). \end{aligned}$$

From (9), we in particular have that

$$\begin{aligned}&\int _{0}^{T} \, -(\partial _{t} y_n(t),\partial _{t} \varphi (t))_H + (u_n \nabla y_n(t), \nabla \varphi (t))_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\\&\quad =\int _0^T(f(t), \varphi (t))_H \,\mathrm {d}t+ (y_1,\varphi (0))_H \end{aligned}$$

for arbitrary \(\varphi \in W \cap L^2(I,H^3(\Omega ))\) with \(\varphi (T) =0\). Since \(u_n\rightarrow u\) strongly in \(L^r(\Omega )\) and \(u_n,u\in {\hat{U}}\subset L^\infty (\Omega )\), we have for \(r\in [1,2]\)

$$\begin{aligned} \Vert u_n-u\Vert _{H}\le \Vert u_n-u\Vert _{L^\infty (\Omega )}^{(2-r)/2}\Vert u_n-u\Vert _{L^r(\Omega )}^{r/2}\le C\Vert u_n-u\Vert _{L^r(\Omega )}^{r/2} \end{aligned}$$

and thus for all \(r\in [1,\infty )\)

$$\begin{aligned}&\left| \int _0^T(u_n\nabla y_n-u\nabla {\bar{y}}, \nabla \varphi )_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\right| \\&\quad \le \left| \int _0^T((u_n-u)\nabla \varphi ,\nabla y_n)_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\right| +\left| \int _0^T(\nabla (y_n-{{\bar{y}}}),u\nabla \varphi )_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\right| \\&\quad \le C\Vert y_n\Vert _{L^2(I,V)}\Vert u_n-u\Vert _H\Vert \nabla \varphi \Vert _{L^2(I,C({\overline{\Omega }}))} +\left| \int _0^T(\nabla (y_n-{{\bar{y}}}),u\nabla \varphi )_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\right| \rightarrow 0. \end{aligned}$$

Then we can pass to the limit in the weak formulation to obtain

$$\begin{aligned} \int _{0}^{T}- \, (\partial _{t} {\bar{y}}(t), \partial _{t} \varphi (t))_H + (u \nabla {\bar{y}}(t), \nabla \varphi (t))_{H} - (f(t), \varphi (t))_H \,\mathrm {d}t- (y_1,\varphi (0))_H = 0 \end{aligned}$$

for all such \(\varphi \). Since \(u\in {\hat{U}}\) and since the set of functions with \(\varphi \in W \cap L^2(I,H^3(\Omega ))\) and \(\varphi (T) =0\) is dense in \(\{\varphi \in W: \varphi (0)=0\}\), the last equation also holds for all \(v\in W\) with \(v(T) =0\). The density can be shown by adapting the density argument of \(C^\infty ({\overline{\Omega }})\) in V; see, e.g., [38, Cor. 9.8].

It remains to show that the limit y satisfies the initial condition \(y(0)=y_0\). First, for each \(v\in H\) we have \((y_n(\cdot ),v)_H \rightharpoonup ({{\bar{y}}}(\cdot ),v)_H\) in \(W^{1,2}(I)\). Hence

$$\begin{aligned} (y_n(\cdot ),v)_H \rightarrow ({{\bar{y}}}(\cdot ),v)_H \quad \text { in } C({\overline{I}}). \end{aligned}$$

due to the compact embedding of \(W^{1,2}(I)\) to \(C({\overline{I}})\). In particular this implies that \((y_n(0),v)_H=(y_0,v)_H =({{\bar{y}}}(0),v)_H\) for all \(v\in H\). Since \(v\in H\) was arbitrary, this implies that \({{\bar{y}}}(0) =y_0\). This implies that \(y=y(u)\), and since the solution of (9) is unique, a subsequence–subsequence argument shows that the full sequence converges weakly to y(u). By a similar argument, \(y_n(t) \rightharpoonup {{\bar{y}}}(t)\) in H for all \(t\in {\overline{I}}\). \(\square \)

Stronger continuity of \(u\mapsto y(u)\) can be shown with respect to the \(L^\infty \) topology for the controls.

Lemma 3.5

Assume that \(f\in L^2(I,V)\) and let \(\{u_n\}_{n\in {\mathbb {N}}}\subset {\hat{U}}\) be a sequence with \(u_n\rightarrow u\) in \(L^\infty (\Omega )\). Then \(y(u_n)\rightarrow y(u)\) in \(L^2(I,V)\cap W^{1,2}(I,H)\).


First, the embedding \(L^\infty (\Omega )\subset L^r(\Omega )\), \(r\in [1,\infty )\), for bounded \(\Omega \) together with Lemma 3.4 shows that \(y_n:= y(u_n)\rightharpoonup y(u)\) in \(L^2(I,V)\cap W^{1,2}(I,H)\cap W^{2,2}(I,V^*)\).

We now introduce for \(u\in {\hat{U}}\) and \(y:=y(u)\) the energy

$$\begin{aligned} {\mathcal {E}}_u(t) := \frac{1}{2} \left( u\nabla y(t),\nabla y(t) \right) _{{\mathbb {L}}^2(\Omega )} + \frac{1}{2}\Vert \partial _t y(t) \Vert _{H}^2 + \frac{1}{2}\Vert y(t) \Vert _H^2 \quad \text {for a.e.}~ t\in I. \end{aligned}$$

By the Lions–Magenes Lemma ([37, Lem. 8.3], cf. also [23, (2.24), p. 24]), we have that

$$\begin{aligned} \frac{d}{dt} {\mathcal {E}}_u(t) = \left( f(t),\partial _t y(t) \right) _{H} + \left( y(t),\partial _t y(t) \right) _H\quad \text {in }L^1(I), \end{aligned}$$

and thus by the fundamental theorem of calculus we find that

$$\begin{aligned} {\mathcal {E}}_u(t) = {\mathcal {E}}_u(0) + \int _0^t \left( f(s),\partial _t y(s) \right) _{H} + \left( y(s),\partial _t y(s) \right) _H\,\mathrm {d}s. \end{aligned}$$

We now define for \(v\in V\)

$$\begin{aligned} \Vert v \Vert _{V_u}^2:=\left( u\nabla v ,\nabla v \right) _{{\mathbb {L}}^2(\Omega )} + \Vert v \Vert _H^2, \end{aligned}$$

which is an equivalent norm on V for any \(u\in {\hat{U}}\). Subtracting (13) for \({\mathcal {E}}_u\) and \({\mathcal {E}}_{u_n}\) and adding the productive zero then yields for almost every \(t\in I\) that

$$\begin{aligned}&\frac{1}{2} \Vert y(t) \Vert _{V_u}^2 + \frac{1}{2}\Vert \partial _t y(t) \Vert _H^2 -\left( \frac{1}{2} \Vert y_n(t) \Vert _{V_u}^2 + \frac{1}{2}\Vert \partial _t y_n(t) \Vert _H^2\right) \nonumber \\&\quad = \frac{1}{2}\left( (u_n-u)\nabla y_n(t),\nabla y_n(t) \right) _{{\mathbb {L}}^2(\Omega )} + \frac{1}{2}\left( (u-u_n)\nabla y_0,\nabla y_0 \right) _{{\mathbb {L}}^2(\Omega )} \nonumber \\&\qquad + \int _0^t \!\!\left( f(s),\partial _t y(s)-\partial _t y_n(s) \right) _H + \left( y(s)-y_n(s),\partial _t y(s) \right) _H \nonumber \\&\qquad + \left( y_n(s),\partial _t y(s)-\partial _t y_n(s) \right) _H\!\! \,\mathrm {d}s, \end{aligned}$$

and hence that

$$\begin{aligned}&\left| \frac{1}{2} \Vert y(t) \Vert _{V_u}^2 + \frac{1}{2}\Vert \partial _t y(t) \Vert _H^2 -\left( \frac{1}{2} \Vert y_n(t) \Vert _{V_u}^2 + \frac{1}{2}\Vert \partial _t y_n(t) \Vert _H^2\right) \right| \\&\quad \le \Vert u_n-u \Vert _{L^\infty (\Omega )} \Vert y_n \Vert _{C({\overline{I}},V)}^2\\&\qquad +(\Vert f\Vert _{L^2(I,V)}+\Vert y_n\Vert _{L^2(I,V)})\Vert \partial _t y-\partial _ty_n\Vert _{L^2(I,V^*)}+\Vert \partial _t y\Vert _{L^2(I,H)}\Vert y-y_n\Vert _{L^2(I,H)}. \end{aligned}$$

We know from Lemma 3.4 that \(y_n\rightharpoonup y\) in \(L^2(I,V)\cap W^{1,2}(I,H)\cap W^{2,2}(I,V^*)\). Thus the Aubin–Lions Lemma and the compactness of the embeddings \(V\hookrightarrow H\) and \(H\hookrightarrow V^*\) imply that \(y_n \rightarrow y\) in \(L^2(I,H)\cap W^{1,2}(I,V^*)\). Thus we have

$$\begin{aligned} \int _0^T\Vert y_n(t) \Vert _{V_u}^2+\Vert \partial _t y_n(t) \Vert _{H}^2\,\mathrm {d}t\rightarrow \int _0^T\Vert y(t) \Vert _{V_u}^2+\Vert \partial _t y(t) \Vert _{H}^2\,\mathrm {d}t. \end{aligned}$$

Since \(\Vert \cdot \Vert _{V_u}\) is an equivalent norm on V, we have that the normed vector space \(V_u := (V,\Vert \cdot \Vert _{V_u})\) is equivalent to V and hence that \(L^2(I,V_u)\cap W^{1,2}(I,H)\simeq L^2(I,V)\cap W^{1,2}(I,H)\). This implies that \(y_n\rightharpoonup y\) also in \(L^2(I,V_u)\cap W^{1,2}(I,H)\), and together with (15) the Radon–Riesz property of Hilbert spaces implies that \(y_n\rightarrow y\) strongly in \(L^2(I,V_u)\cap W^{1,2}(I,H)\). Appealing again to the equivalence of V and \(V_u\) then yields the claim. \(\square \)

Under additional assumptions, we can show an improved regularity result.

Assumption 3.1

The data satisfy \(f\in L^2(I;V)\) and \((y_0,y_1)\in H^2(\Omega )\times H^1(\Omega )\) with \(\partial _\nu y_0 =0\). Furthermore,

  1. (i)

    \({\hat{u}}\) is constant on \(\Omega \setminus \omega _c\) and

  2. (ii)

    \(y_0\) is constant on \(\omega _c\) and \(\overline{\omega _c} \subset \Omega \).

Proposition 3.6

Let \(u\in {\hat{U}}\) and Assumption 3.1 hold. Then there exists \(q > 2\) and a constant \({\hat{C}}\) independent of u such that \(y(u) \in {L^\infty (I,W^{1,q}(\Omega ))}\) and for all \(u \in {\hat{U}}\),

$$\begin{aligned} \Vert y(u) \Vert _{L^\infty (I,W^{1,q}(\Omega ))} \le {\hat{C}}\left( \Vert y_1 \Vert _{V} + \Vert y_0 \Vert _{H^2(\Omega )}+ \Vert f \Vert ^2_{L^2(I,V)}\right) . \end{aligned}$$


We proceed in two steps.

Step 1 First we assume that additionally

$$\begin{aligned} f \in W^{2,2}(I,V),\qquad y_0 \in H^3(\Omega ) \quad \text {with}\quad \partial _\nu y_0 = 0, \qquad \text {and }\quad y_1 \in H^2(\Omega ). \end{aligned}$$

Let \(u\in {\hat{U}}\) and approximate u by \(\{u_n\} \subset C^\infty ({\overline{\Omega }}) \cap {\hat{U}}\) with \(u_n\rightarrow u\) in \(L^r(\Omega )\) for some \(r\in [1,\infty )\) with \(u_n={\hat{u}}\) in \(\Omega \setminus \omega _c\). Such a sequence can be found by first approximating u by \({\tilde{u}}_n \in C^\infty ({\mathbb {R}}^d)\) and \({\tilde{u}}_n = {\hat{u}}\) in \(\Omega \setminus \omega _c\); this sequence in turn is constructed by first introducing an intermediate approximation of functions \({\hat{u}}_n\) with the property that \(\lim _{n\rightarrow \infty } {\hat{u}}_n =u\) in \(L^r(\Omega )\) and \({\hat{u}}_n= {\hat{u}}\) in \(\Omega \setminus {\hat{\omega }}_{c,n}\), where the closure of \({\hat{\omega }}_{c,n}\) is contained in \(\omega _c\) and \(0<\mathrm {dist}(\partial \omega _c, {\hat{\omega }}_{c,n}) \le n^{-1}\). Then we use convolution by mollifiers of the functions \({\hat{u}}_n\) to obtain functions \({\tilde{u}}_n\in C^\infty ({\mathbb {R}}^d)\) that satisfy \(\lim _{n\rightarrow \infty } {\tilde{u}}_n =u\) in \(L^r(\Omega )\), see e.g. [39, page 132], and \({\tilde{u}}_n={\hat{u}}\) in \(\Omega \setminus \omega _c\). Next we choose functions \({\underline{\varphi }}^n\) and \({\overline{\varphi }} _n\) in \(C^\infty ({\mathbb {R}})\) with a Lipschitz constant L and such that \({\underline{u}}\le {\underline{\varphi }}^n \), \({\overline{\varphi }}_n\le {\overline{u}}\), and \({\underline{\varphi }}^n(s)\rightarrow \max ({\underline{u}},s)\), \({\overline{\varphi }}_n(s) \rightarrow \min ({\overline{u}},s)\) for all \(s\in {\mathbb {R}}\), see [39, page 125]. We then set \(u_n= {\overline{\varphi }}_n({\underline{\varphi }}^n ({\tilde{u}}_n))\), and estimate

$$\begin{aligned} \begin{aligned} \Vert u-u_n\Vert _{L^r(\Omega )}&\le \Vert u-{\overline{\varphi }}_n({\underline{\varphi }}^n (u))\Vert _{L^r(\Omega )} + \Vert {\overline{\varphi }}_n({\underline{\varphi }}^n (u)) - {\overline{\varphi }}_n({\underline{\varphi }}^n (u_n)) \Vert _{L^r(\Omega )}\\&\le \Vert u-{\overline{\varphi }}_n({\underline{\varphi }}^n(u))\Vert _{L^r(\Omega )} + L^2 \Vert u-{\tilde{u}}_n\Vert _{L^r(\Omega )} \rightarrow 0, \end{aligned} \end{aligned}$$

using Lebesgue’s bounded convergence theorem.

We replace u in (2) by \(u_n\). Due to the regularity assumptions (17) and the assumption that \(y_0\) is constant on \(\omega _c\), we have

$$\begin{aligned} y_0 \in H^3(\Omega ), \quad y_1 \in H^2(\Omega ), \quad f(0) + {\mathrm {div}}(u_n \nabla y_0) \in V, \quad f'(0) + {\mathrm {div}}(u_n \nabla y_1) \in H. \end{aligned}$$

Together with \({\partial _\nu y_0}= 0\) and \(f \in W^{2,2}(I,V) \subset W^{2,2}(I,H)\), these properties allow applying Theorem 30.3 (with \(k=3\)) and Theorem 30.4 in [40], which guarantee that \(y_n = y(u_n) \in W^{1,2}(I,H^2(\Omega ))\cap W^{2,2}(I,V)\) and \({\partial _\nu y_n}(t) = 0\) on \(\partial \Omega \) for \(t \in \overline{I}\). Then we multiply (2) with \(-{\partial _t} {\mathrm {div}}(u_n\nabla y_n(t))\) and integrate over \(\Omega \). Integrating by parts on the right-hand side and using that \(\partial _\nu \partial _ty_n=0\) on \(\partial \Omega \), we obtain

$$\begin{aligned} (u_n \nabla \partial _{tt} y_n(t), \nabla \partial _t y_n(t))_{{\mathbb {L}}^2(\Omega )} +\frac{1}{2} \partial _t \Vert {\mathrm {div}}(u_n\nabla y_n(t)) \Vert ^2_H = (\nabla f(t), u_n \nabla \partial _t y_n(t))_{{\mathbb {L}}^2(\Omega )} \end{aligned}$$

and thus

$$\begin{aligned}&{\partial _t} \Vert \sqrt{u_n} \nabla \partial _t y_n(t) \Vert ^2_{{\mathbb {L}}^2(\Omega )} + \partial _t \Vert {\mathrm {div}}(u_n \nabla y_n(t)) \Vert ^2_H \\&\quad \le \Vert \sqrt{u_n}\nabla f(t) \Vert ^2_{{\mathbb {L}}^2(\Omega )} + \Vert \sqrt{u_n} \nabla \partial _t y_n(t) \Vert ^2_{{\mathbb {L}}^2(\Omega )}. \end{aligned}$$

Integrating this expression on (0, t), we find for \(t \in (0,T]\) that

$$\begin{aligned}&\Vert \sqrt{u_n} \nabla \partial _t y_n(t) \Vert ^2_{{\mathbb {L}}^2(\Omega )} + \Vert {\mathrm {div}}(u_n \nabla y_n(t)) \Vert ^2_H \\&\quad \le \Vert \sqrt{u_n} \nabla y_1 \Vert ^2_{{\mathbb {L}}^2(\Omega )} + \Vert {\mathrm {div}}(u_n \nabla y_0) \Vert ^2_H \\&\qquad + \int _{0}^{t} \Vert \sqrt{u_n}\nabla f(s) \Vert ^2_{{\mathbb {L}}^2(\Omega )} \,\mathrm {d}s+ \int _{0}^{t} \Vert \sqrt{u_n} \nabla \partial _t y_n(s) \Vert ^2_{{\mathbb {L}}^2(\Omega )} \,\mathrm {d}s. \end{aligned}$$

Gronwall’s inequality then implies that for each \(t \in (0,T]\),

$$\begin{aligned} \Vert {\mathrm {div}}(u_n \nabla y_n(t)) \Vert ^2_H \le \left( \Vert \sqrt{u_n} \nabla y_1 \Vert ^2_{{\mathbb {L}}^2(\Omega )}+ \Vert {\mathrm {div}}(u_n \nabla y_0) \Vert ^2_H + \overline{u}\Vert f \Vert ^2_{L^2(I,V)}\right) e^{T}. \end{aligned}$$

Since \(y_1 \in H^2(\Omega )\), it follows that \(\{\sqrt{u_n} \nabla y_1\}_{n\in {\mathbb {N}}}\) is bounded in \({\mathbb {L}}^2(\Omega )\). Moreover, we have \({\mathrm {div}}(u_n \nabla y_0)|_{\Omega \setminus \omega _c} = {\mathrm {div}}({\hat{u}} \nabla y_0)|_{\Omega \setminus \omega _c} = {\hat{u}} \Delta y_0|_{\Omega \setminus \omega _c}\) and \({\mathrm {div}}(u_n \nabla y_0)|_{\omega _c} =0\). Hence \(\{{\mathrm {div}}(u_n \nabla y_0)\}_{n\in {\mathbb {N}}}\) is bounded in H. We can thus conclude that \(\{{\mathrm {div}}(u_n \nabla y_n)\}_{n\in {\mathbb {N}}}\) is bounded in \(L^\infty (I, H)\).

Our next aim is to obtain \(L^\infty (I,W^{1,q}(\Omega ))\) regularity and boundedness for \(y_n\) for some \(q > 2\). For this purpose, we define for some \(\lambda > 0\)

$$\begin{aligned} g_n := -{\mathrm {div}}(u_n \nabla y_n) + \lambda y_n \end{aligned}$$

and note that \(\{g_n\}_{n\in {\mathbb {N}}}\) is bounded in \(L^\infty (I,H)\). Furthermore, Sobolev’s embedding theorem implies that \(W^{1,s'}(\Omega ) \hookrightarrow L^2(\Omega )\) for every \(s' \ge 1\) in case \(d = 2\), and for every \(s' \ge \frac{6}{5}\) in case \(d = 3\). Following the notation of [14], we denote by \(W^{-1,s}(\Omega )\) the dual space of \(W^{1,s'}(\Omega )\) with s the conjugate of \(s'\). Then we have \(H \hookrightarrow W^{-1,s}(\Omega )\), where \(s \in [1, \infty )\) for \(d = 2\) and \(s \in [1,6]\) for \(d = 3\). It follows that \(\{g_n\}_{n\in {\mathbb {N}}}\) is bounded in \(L^\infty (I,W^{-1,s}(\Omega ))\). Considering now (18) (together with homogeneous Neumann boundary conditions) for a.e. \(t\in I\) as an equation for \(y_n(t)\), this implies that there exists some \(q > 2\) such that

$$\begin{aligned} \Vert y_n(t)\Vert _{W^{1,q}(\Omega )}\le C\Vert g_n\Vert _H\le {\hat{C}}\left( \Vert y_1 \Vert _{V} + \Vert y_0 \Vert _{H^2(\Omega )} + \Vert f \Vert ^2_{L^2(I,V)}\right) , \end{aligned}$$

where the constant C depends only on \({\underline{u}}\), \({\overline{u}}\) and q, but not on t, see [14, Thm. 1]. Hence \(\{y_n\}_{n\in {\mathbb {N}}}\) is bounded in \(L^\infty (I,W^{1,q}(\Omega ))\). Since \(L^1(I)\) and \(W^{1,q}(\Omega )\) are separable with the latter being reflexive, \(L^\infty (I,W^{1,q}(\Omega ))\) is the dual of a separable space; see, e.g., [41, Thm. 8.18.3]. Hence there exists a subsequence with \(y_n \rightharpoonup ^* {\bar{y}} \in L^\infty (I,W^{1,q}(\Omega ))\).

Finally, from Lemma 3.4 we also have that \(y_n\rightharpoonup y(u)\) in \(L^2(I,V)\cap W^{1,2}(I,H)\) and hence, by uniqueness of y(u), that \(y(u)={{\bar{y}}} \in L^\infty (I,W^{1,q}(\Omega ))\). Using \(\hbox {weak}^*\) semi-continuity of norms (cf., e.g., [38, p. 63]), we can now pass to the limit in (19) to obtain (16), for those \((y_0,y_1,f)\) which satisfy the additional regularity assumption (17).

Step 2 We relax the requirements on the problem data and choose an arbitrary \((y_0,y_1,f)\in X:= H^2(\Omega )\times H^1(\Omega )\times L^2(I;V)\), with \(\partial _\nu y_0 =0\). Then there exists \((y^n_0,y^n_1,f^n)\in H^3(\Omega )\times H^2(\Omega )\times W^{2,2}(I;V)\), with \(\partial _\nu y^n_0 =0\) such that \(\lim _{n\rightarrow \infty }(y^n_0,y^n_1,f^n)= (y_0,y_1,f) \) in X. As this is standard for the second and third component, we only address the first one. Let \(\omega _c^c := \Omega \setminus \overline{\omega _c}\). By assumption, \(\partial \Omega \cap \partial \omega _c= \emptyset \); in addition, \(\Omega \) and \(\omega _c\) are \(C^{2,1}\) domains, and thus \(\omega _c^c\) is a \(C^{2,1}\) domain as well. It is in this step that the \(C^{2,1}\) regularity of the domains is used. Since \(y_0 \in H^2(\Omega )\), we have \(y_0|_{\partial \Omega } \in H^{3/2}(\partial \Omega )\). Moreover, \(y_0|_{\partial \omega _c}=:y_c\) is constant and \(\partial _\nu y_0|_{\partial \omega ^c_c} = \partial _\nu y_0|_{\partial \omega _c \cup \partial \Omega } =0\). Let \({\tilde{v}}_n \in H^{5/2}(\partial \Omega )\) be such that \({\tilde{v}}_n \rightarrow y_0|_{\partial \Omega }\) in \(H^{3/2}(\partial \Omega )\). Accordingly let \(v_n \in H^3(\omega ^c_c)\) with \(v_n|_{\partial \omega _c}=y_c\), \(\partial _\nu v_n |_{\partial \omega ^c_c}=0\), \(\partial _{x_i x_j}v_n|_{\partial \omega _c}=0\), \(v_n|_{\partial \Omega } = {\tilde{v}}_n\), and \(v_n\rightarrow v\) in \(H^2(\omega ^c_c)\), where \(v\in H^2(\omega ^c_c)\) satisfies \(v|_{\partial \omega _c}=y_c\), \(\partial _\nu v|_{\partial \omega ^c_c}=0\), and \(v|_{\partial \Omega } = y_0|_{\partial \Omega }\). Denoting by \(v_{n,\mathrm {ext}}\) and \(v_{\mathrm {ext}}\) the extensions of \(v_n\) and v by the constant \(y_c\) on \(\omega _c\), we have \(v_{n,\mathrm {ext}} \rightarrow v_{\mathrm {ext}} \in H^2(\Omega )\), \(\partial _\nu v_{n,\mathrm {ext}}|_{\partial \Omega }=0\), \(v_{n,\mathrm {ext}}|_{\omega _c}= v_{\mathrm {ext}}|_{\omega _c}=y_c\), and \(v_{n,\mathrm {ext}} \in H^3(\Omega ) \), where we use that \(\partial _{x_i x_j}v_n|_{\partial \omega _c}=0\). Next, observe that \(y_0-v \in H^2_0(\omega ^c_c)\). This implies the existence of functions \(w_n\in C^\infty (\omega _c^c)\) with compact support in \(\omega ^c_c\) such that \(w_n \rightarrow y_0-v\) in \(H^2(\omega _c^c)\); see, e.g., [42, pp. 17 and 31]. Denoting the extension by zero to \(\omega _c\) of \(w_n\) by \(w_{n,\mathrm {ext}}\), we have \(w_{n,\mathrm {ext}}\in H^3(\Omega ), w_{n,\mathrm {ext}} \rightarrow y_0-v_{\mathrm {ext}}\) in \(H^2(\Omega )\), and \(\partial _\nu w_{n,\mathrm {ext}}=0|_{\partial \Omega }\). Finally, the sequence \(y^n_0=v_{n,\mathrm {ext}}+w_{n,\mathrm {ext}}\) defines the desired approximation of \(y_0\) such that \(y^n_0|_{\omega _c}=y_c\), \(\partial _\nu y_n|_{\partial \Omega }=0\), and \(y^n_0 \rightarrow y_0\) in \(H^2(\Omega )\).

We next define \(y^n\) as the solution to (2) with data \((y_0^n,y_1^n,f^n)\). From Lemma 3.1 and (16) for the smooth data \((y_0^n,y_1^n,f^n)\) we deduce that \(y^n \rightharpoonup y\) in \(L^2(I,V)\cap W^{1,2}(I,H)\) and \(y^n \rightharpoonup ^* y\) in \(L^\infty (I,W^{1,q}(\Omega ))\), where y is the solution of (2) with data \((y_0,y_1,f)\), and

$$\begin{aligned} \Vert y^n \Vert _{L^\infty (I,W^{1,q}(\Omega ))} \le {\hat{C}}\left( \Vert y^n_1 \Vert _{V} + \Vert y^n_0 \Vert _{H^2(\Omega )}+ \Vert f^n \Vert ^2_{L^2(I,V)}\right) \end{aligned}$$

for all n. Passing to the limit as \(n\rightarrow \infty \), we obtain (16) for \((y_0,y_1,f)\in X\) with \(\partial _\nu y_0=0\). \(\square \)

Remark 3.7

If Assumption 3.1 holds, the requirement in Lemma 3.5 on the convergence of \(u_n\) can be relaxed to \(u_n\rightarrow u\) in \(L^{\frac{q}{q-2}}(\Omega )\), where q is the exponent from Proposition 3.6. In this case, the first term on the right-hand side in (14) can be estimated by Hölder’s inequality as

$$\begin{aligned} \int _{\Omega } |u_n-u| |\nabla y_n(t)|^2 \,\mathrm {d}x\le \Vert u_n-u \Vert _{L^{\frac{q}{q-2}}(\Omega )}\Vert y_n(t) \Vert _{L^\infty (I,W^{1,q} (\Omega ))}^2\rightarrow 0, \end{aligned}$$

where we used Proposition 3.6. Then again \(y(u_n) \rightarrow y(u)\) in W.

Remark 3.8

In Assumption 3.1 (ii), the requirement \(\overline{\omega _c} \subset \Omega \) was only used in Step 2 of the proof of Proposition 3.6. It it is not necessary if instead \(y_0\in H^3(\Omega )\) is assumed.

Existence and Optimality Conditions

Deriving useful optimality conditions requires replacing the pointwise control constraints with a differentiable approximation of a cutoff function. We thus introduce the superposition operator

$$\begin{aligned} \Phi _\varepsilon : L^1(\omega _c) \rightarrow U, \qquad [\Phi _\varepsilon (u)](x) := \varphi _\varepsilon (u(x)), \end{aligned}$$

where \(\varphi _\varepsilon :{\mathbb {R}}\rightarrow [u_{\min },u_{\max }]\) is such that \(\Phi _\varepsilon \) is Lipschitz continuous from \(L^r(\omega _c)\rightarrow L^r(\omega _c)\) for every \(r\in [1,\infty ]\) and \(\varepsilon \ge 0\) and Fréchet differentiable from \(L^\infty (\omega _c)\rightarrow L^\infty (\omega _c)\) for \(\varepsilon >0\) (and thus ensuring Fréchet differentiability of the tracking term; see Lemma 4.2 below). The construction of such a \(\varphi _\varepsilon \) and the characterization of the Fréchet derivative of \(\Phi _\varepsilon \) via pointwise a.e. multiplication can be carried out in the same way as in [10, § 2.3].

We then consider for \(\varepsilon \ge 0\) the reduced, unconstrained optimization problem

$$\begin{aligned} \min _{u\in BV(\omega _c)} J_\varepsilon (u) \end{aligned}$$


$$\begin{aligned} J_\varepsilon (u) := \frac{1}{2} \Vert By({\hat{E}}\Phi _\varepsilon (u))-y_d\Vert _{{\mathcal {O}}}^2+ \alpha G(u) + \beta TV(u) \end{aligned}$$

for some \(y_d \in {\mathcal {O}}\) and \(\alpha ,\beta >0\), where \(u \mapsto y(u)\) denotes the solution mapping of (9) introduced in the previous section and \({\hat{E}}\) is the affine extension operator from \(\omega _c\) to \(\Omega \) defined in (4). We point out that the role of \(\varepsilon \) is not that of a smoothing parameter for the optimization problem, which remains nonsmooth for \(\varepsilon >0\) due to the presence of G and \(\mathrm {TV}\); it merely influences the behavior of the cutoff function near the upper and lower values of the pointwise bounds for the coefficient.

Existence of optimal controls now follows analogously to [10, Prop. 3.1].

Proposition 4.1

For every \(\varepsilon \ge 0\), there exists a global minimizer \({{\bar{u}}}\in BV(\omega _c)\cap U\) of \(J_\varepsilon \).


Since \(J_\varepsilon \) is bounded from below, there exists a minimizing sequence \(\{u_n\}_{n\in {\mathbb {N}}}\subset BV(\omega _c)\). Furthermore, we may assume without loss of generality that there exists a \(C>0\) such that

$$\begin{aligned} C\left( \Vert u_n \Vert _{L^1(\omega _c)} + \mathrm {TV}(u_n)\right) \le J_\varepsilon (u_n)\le J_\varepsilon (0) \quad \text {for all }n\in {\mathbb {N}}, \end{aligned}$$

and hence that \(\{u_n\}_{n\in {\mathbb {N}}}\) is bounded in \(BV(\omega _c)\). By the compact embedding of \(BV(\omega _c)\) into \(L^1(\omega _c)\) for any \(d\in {\mathbb {N}}\), we can thus extract a subsequence, denoted by the same symbol, converging strongly in \(L^1(\omega _c)\) to some \({{\bar{u}}}\in L^1(\omega _c)\). Due to the continuity of \(\Phi _\varepsilon \) as well as of \({\hat{E}}\), we have \({\hat{E}}\Phi _\varepsilon (u_n)\rightarrow {\hat{E}}\Phi _\varepsilon ({{\bar{u}}})\in U\) in \(L^1(\omega _c)\).

Lower semi-continuity of G and \(\mathrm {TV}\) with respect to the strong convergence in \(L^1(\omega _c)\) and the weak convergence \(y({\hat{E}}\Phi _\varepsilon (u_n))\rightharpoonup y({\hat{E}}\Phi _\varepsilon ({{\bar{u}}}))\) in \(L^2(Q)\) from Lemma 3.4 yield that

$$\begin{aligned} J_\varepsilon ({{\bar{u}}}) \le \liminf _{n\rightarrow \infty } J_\varepsilon (u_n) \le J_\varepsilon (u) \qquad \text {for all }u\in BV(\omega _c) \end{aligned}$$

and thus that \({{\bar{u}}}\in BV(\omega _c)\) is the desired minimizer.

The fact that \({{\bar{u}}}\in U\) then follows by a contraposition argument based on Stampacchia’s Lemma for BV functions and the pointwise definition of G, see [10, Prop. 3.2]. \(\square \)

The convergence of minimizers of (20) as \(\varepsilon \rightarrow 0\) can be shown along the same lines as indicated at the end of [10, § 3].

We now derive first-order optimality conditions for the solution of (20). To this end, we first show Fréchet differentiability of the tracking term

$$\begin{aligned} F_\varepsilon :L^\infty (\omega _c)\rightarrow {\mathbb {R}},\qquad F_\varepsilon (u) = \frac{1}{2}\Vert By({\hat{E}}\Phi _\varepsilon (u))-y_d \Vert _{{\mathcal {O}}}^2 \end{aligned}$$

as in [10, Lem. 4.1] by using for given \(u\in L^\infty (\omega _c)\) and \(y\in W\) the definition of the adjoint equation

$$\begin{aligned} \left\{ \begin{aligned} \int _0^T - (\partial _t p, \partial _t \varphi )_H + (u \nabla p, \nabla \varphi )_{{\mathbb {L}}^2(\Omega )} \,\mathrm {d}t&= \int _0^T (B^*(By-y_d), \varphi )_H \,\mathrm {d}t,\\ p(T)&= 0 \text { in } V, \end{aligned} \right. \end{aligned}$$

for any \(\varphi \in W\) with \(\varphi (0)=0\), which admits a unique solution \(p\in W\) by Lemma 3.1. In the following, we use the regularity of solutions to identify the derivative in \(L^\infty (\omega _c)^*\) with its representation in \(L^1(\omega _c)\), considered as a subset of \(L^\infty (\omega _c)^*\). Since the extension operator \({\hat{E}}\) is affine, we also introduce the corresponding linear extension operator \({\hat{E}}_0 = {\hat{E}}':L^2(\omega _c)\rightarrow L^2(\Omega )\).

Lemma 4.2

For every \(\varepsilon >0\), the mapping \(F_\varepsilon \) defined in (21) is Fréchet differentiable in every \(u\in L^\infty (\omega _c)\), and the Fréchet derivative is given by

$$\begin{aligned} F_\varepsilon '(u) = {\hat{E}}_0^*\left( \int _0^T\nabla y\cdot \nabla p\,\mathrm {d}t\right) \,\Phi _\varepsilon '(u) \in L^1(\omega _c), \end{aligned}$$

where \(y=y({\hat{E}}\Phi _\varepsilon (u))\) is the solution of (9), p is the solution of (22), and \({\hat{E}}_0^*:L^2(\Omega )\rightarrow L^2(\omega _c)\) is the restriction operator.

If Assumption 3.1 holds, \(F_\varepsilon '(u) \in L^{\frac{2q}{2+q}}(\omega _c)\) for the \(q>2\) given in Proposition 3.6.


We first show directional differentiability. Let \(w, h\in L^\infty (\omega _c)\) and \(\rho >0\) be arbitrary. We define \({\tilde{y}}(w):=y({\hat{\Phi }}_\varepsilon (w))\). We now insert the productive zero \(B{\tilde{y}}(w)-B{\tilde{y}}(w)\) in \(F_\varepsilon (w+\rho h)\) and expand the square to obtain

$$\begin{aligned} F_\varepsilon (w+\rho h)-F_\varepsilon (w)= & {} \frac{1}{2}\Vert B({\tilde{y}}(w+\rho h)-{\tilde{y}}(w))+(B{\tilde{y}}(w)-y_d) \Vert _{{\mathcal {O}}}^2 \nonumber \\&- \frac{1}{2}\Vert B{\tilde{y}}(w)-y_d \Vert _{{\mathcal {O}}}^2\nonumber \\= & {} \frac{1}{2}\Vert B({\tilde{y}}(w+\rho h)-{\tilde{y}}(w)) \Vert _{{\mathcal {O}}}^2\nonumber \\&+ (B({\tilde{y}}(w+\rho h)-{\tilde{y}}(w)),B{\tilde{y}}(w)-y_d)_{{\mathcal {O}}}. \end{aligned}$$

For the first term, we can use Lemma 3.3, the boundedness of B and the Lipschitz continuity of \(\Phi _\varepsilon \) to estimate

$$\begin{aligned} \frac{1}{2}\Vert B({\tilde{y}}(w+\rho h)-{\tilde{y}}(w)) \Vert _{{\mathcal {O}}}^2 \le C\rho ^2\Vert h \Vert _{L^\infty (\omega _c)}^2. \end{aligned}$$

For the second term in (24), we introduce the adjoint state p(w) and use the fact that \(\delta y:={\tilde{y}}(w+\rho h)-{\tilde{y}}(w)\in W\) with \(\delta y(0)=0\). Testing (22) with \(\varphi = \delta y\), and using (9) for \(y={\tilde{y}}(w)\) and \(y={\tilde{y}}(w+\rho h)\), each time with \(v=p(w)\), and inserting a productive zero, we find

$$\begin{aligned}&\left( \delta y,B^*(B{\tilde{y}}(w)-y_d) \right) _{L^2(Q)}\\&\quad = -\left( \partial _t p(w),\partial _t \delta y \right) _{L^2(Q)} + \left( {\hat{E}}\Phi _\varepsilon (w) \nabla p(w),\nabla \delta y \right) _{L^2(I,{\mathbb {L}}^2(\Omega ))}\\&\quad = \left( ({\hat{E}}\Phi _\varepsilon (w+\rho h)-{\hat{E}}\Phi _\varepsilon (w))\nabla {\tilde{y}}(w+\rho h),\nabla p(w) \right) _{L^2(I,{\mathbb {L}}^2(\Omega ))}. \end{aligned}$$

By Lemma 3.3 we have that \({\tilde{y}}(w+\rho h)\rightharpoonup {\tilde{y}}(w)\) in \(L^2(I,V)\) as \(\rho \rightarrow 0^+\). Moreover, since \(\varepsilon >0\) we have that \({\hat{E}}\Phi _\varepsilon \) is Frechet differentiable in \(L^\infty (\omega _c)\) with \({\hat{E}}_0\Phi _\varepsilon '(w),{\hat{E}}_0\Phi _\varepsilon '(w+\rho h)\in L^\infty (\Omega )\). Hence, dividing (24) by \(\rho >0\) and passing to the limit implies in combination with (25) that

$$\begin{aligned} \begin{aligned} F_\varepsilon '(w;h)&:= \lim _{\rho \rightarrow 0^+}\frac{1}{\rho }(F_\varepsilon (w+\rho h) - F_\varepsilon (w)) \\&= \langle h,\Phi _\varepsilon '(w){\hat{E}}_0^* \int _0^T\nabla {\tilde{y}}(w)\cdot \nabla p(w)\,\mathrm {d}t \rangle _{L^\infty (\omega _c),L^1(\omega _c)}. \end{aligned} \end{aligned}$$

Since the mapping \(h\mapsto F_\varepsilon '(w;h)\) is linear and bounded, \({\hat{E}}_0^*(\int _0^T\nabla y(w)\cdot \nabla p(w)\,\mathrm {d}t)\,\Phi _\varepsilon '(w)\) is the Gâteaux derivative of \(F_\varepsilon \) at w. Thus, \(F_\varepsilon \) is Gâteaux differentiable in \(L^\infty (\omega _c)\).

It remains to show that this is also a Fréchet derivative. From the above, we have that

$$\begin{aligned}&\left| F_\varepsilon (w+h)-F_\varepsilon (w)-F_\varepsilon '(w)h\right| \\&\quad \le C\Vert h \Vert _{L^\infty (\omega _c)}^2\\&\qquad + \left| \left( ({\hat{E}}\Phi _\varepsilon (w+h)-{\hat{E}}\Phi _\varepsilon (w))\nabla {\tilde{y}}(w+h)- {\hat{E}}_0\Phi _\varepsilon '(w)h\nabla {\tilde{y}}(w),\nabla p(w) \right) _{L^2(I,{\mathbb {L}}^2(\Omega ))}\right| \\&\quad \le C\Vert h \Vert _{L^\infty (\omega _c)}^2\\&\qquad +\left( \Vert {\hat{E}}\Phi _\varepsilon (w+h)-{\hat{E}}\Phi _\varepsilon (w)- {\hat{E}}_0\Phi _\varepsilon '(w)h \Vert _{L^\infty (\omega _c)}\Vert \nabla {\tilde{y}}(w+h) \Vert _{L^2(I,{\mathbb {L}}^2(\omega _c))}\right. \\&\qquad + \left. \Vert {\hat{E}}_0\Phi _\varepsilon '(w)\Vert _{L^\infty (\omega _c)} \Vert \nabla {\tilde{y}}(w+h)-\nabla {\tilde{y}}(w)\Vert _{L^2(I,{\mathbb {L}}^2(\Omega ))}\right) \Vert \nabla p(w)\Vert _{L^2(I,{\mathbb {L}}^2(\Omega ))}, \end{aligned}$$

and hence that

$$\begin{aligned} \frac{|F_\varepsilon (w+h)-F_\varepsilon (w)-F_\varepsilon '(w)h|}{\Vert h \Vert _{L^\infty (\omega _c)}} \rightarrow 0\quad \text {for}\quad \Vert h\Vert _{L^\infty (\omega _c)}\rightarrow 0 \end{aligned}$$

since \({\tilde{y}}(w+h) \rightarrow {\tilde{y}}(w)\) in \(L^2(I,V)\) by Lemma 3.5.

The regularity follows from \({\tilde{y}}(w),p(w)\in L^2(I,V)\) together with the properties of the norm in Bochner spaces, cf. [43, Cor. V.1]. If Assumption 3.1 holds, Proposition 3.6 yields \({\tilde{y}}(w)\in L^\infty (I,W^{1,q}(\Omega ))\) for some \(q>2\) and hence \({\hat{E}}_0^*\left( \int _0^T\nabla {\tilde{y}}(w)\cdot \nabla p(w)\,\mathrm {d}t\right) \Phi _\varepsilon '(w)\in L^2(I,L^{\frac{2q}{2+q}}(\omega _c))\). \(\square \)

We can now proceed exactly as in [10] to obtain first-order necessary optimality conditions.

Theorem 4.3

([10, Thm. 4.2]) If \(\varepsilon >0\) and Assumption 3.1 holds, every local minimizer \({{\bar{u}}}\in BV(\omega _c)\) to (20) satisfies

$$\begin{aligned} -F_\varepsilon '({{\bar{u}}})\in \alpha \,\partial G({{\bar{u}}}) + \beta \,\partial \mathrm {TV}({{\bar{u}}})\subset L^\frac{2q}{2+q}(\omega _c), \end{aligned}$$

where G and \(\mathrm {TV}\) are considered as extended real-valued convex functionals on \(L^\frac{2q}{q-2}(\omega _c)\).

Introducing explicit subgradients for the two subdifferentials, we obtain primal–dual optimality conditions.

Corollary 4.4

([10, Cor. 4.3]) For any local minimizer \({{\bar{u}}}\in BV(\omega _c)\) to (20), there exist \({{\bar{q}}} \in L^\frac{2q}{2+q}(\omega _c)\) and \({\bar{\xi }} \in L^{\frac{2q}{2+q}}(\omega _c)\) satisfying

$$\begin{aligned} \left\{ \begin{aligned} 0&= F_\varepsilon '({{\bar{u}}}) + \alpha {{\bar{q}}} + \beta {\bar{\xi }},\\ {{\bar{q}}}&\in \partial G({{\bar{u}}}),\\ {\bar{\xi }}&\in \partial \mathrm {TV}({{\bar{u}}}). \end{aligned} \right. \end{aligned}$$

These conditions can be further interpreted pointwise. First, using the characterization of Lemma 4.2, we can identify the first term in the first equation with the \(L^{1+\delta }(\omega _c)\) function given by

$$\begin{aligned} \left[ \int _0^T\nabla y(u) \cdot \nabla p(u)\,\mathrm {d}t\right] (x)\Phi _\varepsilon '({{\bar{u}}}(x)) \quad \text {for a.e. }x\in \omega _c. \end{aligned}$$

Second, using the characterization of \(\partial G\) from [10, § 2], we have that

$$\begin{aligned} {{\bar{q}}}(x) \in {\left\{ \begin{array}{ll} \{-u_m\} &{} {{\bar{u}}}(x)< u_1,\\ {[}-u_m, \frac{1}{2}(u_1+u_2)] &{} {{\bar{u}}}(x) = u_1,\\ \{\frac{1}{2}(u_i+u_{i+1})\} &{} {{\bar{u}}}(x) \in (u_i,u_{i+1}), \quad \ 1\le i<m,\\ \left[ \tfrac{1}{2}(u_{i-1}+u_{i}),\tfrac{1}{2}(u_{i}+u_{i+1})\right] &{} {{\bar{u}}}(x) = u_{i},\qquad \qquad 1\le i<m,\\ {[} \frac{1}{2}(u_{m-1}+u_m),u_m] &{} {{\bar{u}}}(x) = u_m,\\ u_m &{} {{\bar{u}}}(x) > u_m. \end{array}\right. } \end{aligned}$$

The interpretation of the final term is more delicate. Informally, \(\xi (x)\) corresponds to the mean curvature of \({{\bar{u}}}(x)\) (if \({{\bar{u}}}\) is smooth at x) or the signed normal to its jump set (if \({{\bar{u}}}\) has a jump discontinuity across a measurable curve of \(d-1\)-dimensional Hausdorff measure greater zero). This can be made more precise using the notion of the full trace from [44], see also [45].

Numerical Solution

In this section, we address the numerical solution of (20) using a stabilized space-time finite element discretization for second-order hyperbolic equations [15] and a nonlinear primal–dual proximal splitting algorithm [16,17,18]. Since we now consider a finite-dimensional optimization problem, we can include the constraint \(u \in U\) directly via the multi-bang penalty instead of enforcing it inside the state equation. In this and the following section, we will therefore omit \(\Phi _\varepsilon \) from the discretized tracking term (and, with it, \(\varepsilon \) in general) and define the multi-bang penalty with \(\mathrm {dom}\,g = [u_1,u_d]\) as in [12]; see (34) below.


We consider a mesh \({\mathcal {T}}_h\) consisting of a finite set of triangles or tetrahedra T with a mesh size h. Then we introduce the space \(D_h\subset H^1(\Omega )\cap C({\overline{\Omega }})\) of linear finite elements based on the triangulation \({\mathcal {T}}_h\). A basis of this space is given by the standard hat functions \(\varphi _i\) associated with nodes \(x_i\), \(i=1,\ldots , N_h\), of the triangulation \({\mathcal {T}}_h\). Next we discretize the time interval I uniformly by \(0=t_0< \cdots < t_{N_\tau }=T\) and grid size of \(\tau \). Similarly, we define the space \(D_\tau \subset H^1(I)\cap C({\overline{I}})\) of piecewise linear and continuous functions with respect to these grids. Furthermore we consider the hat functions \(e_i\), \(i=0,\ldots ,N_\tau \), with \(e_i(t_l)=\delta _{il}\) which form a basis of \(D_\tau \). We assume that \(\omega _c\) can be represented by the triangulation \({\mathcal {T}}_h\) exactly and introduce the space

$$\begin{aligned} D_h^c=\{u_h\in D_h \mid {\text {supp}}u_h\subseteq \overline{\omega _c}\}={\text {span}}\{\varphi _i|_{\overline{\omega _c}}\mid x_i\in \overline{\omega _c}\}. \end{aligned}$$

Moreover we introduce the space \(C_h^c\) of piecewise constant functions on the triangles in \(\omega _c\). In the following we also identify \(D_h^c\) with \({\mathbb {R}}^{N_c}\) for \(\dim D_h^c=N_c\) and \(C_h^c\) with \({\mathbb {R}}^{M_c}\) for \(\dim C_h^c=M_c\). Finally we define \(\vartheta :=(\tau ,h) >0\) and introduce the stabilization parameter \(\sigma \ge 0\).

Definition 5.1

We call \(y_{\vartheta }\in D_{\vartheta }:=D_h \otimes D_\tau \) a discrete solution of (9) if \(y_{\vartheta }\) satisfies

$$\begin{aligned}&\int _0^T-(\partial _t y_{\vartheta },\partial _t v)_{H}-(\sigma -\tfrac{1}{6})\tau ^2({\hat{E}}u_h\nabla \partial _t y_{\vartheta },\nabla \partial _tv)_{{\mathbb {L}}^2(\Omega )}+ ({\hat{E}}u_h\nabla y_{\vartheta },\nabla v)_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\nonumber \\&\quad = (y_1,v(0))_{H} + \int _0^T(f,v)_{H}\,\mathrm {d}t\end{aligned}$$

for all \(v \in D_{\vartheta }\) with \(v(T)=0\) and initial condition \(y_{\vartheta }(0)=S_0 y_0\) defined via

$$\begin{aligned} (S_0 y_0,\varphi )_{H}=(y_0,\varphi )_{H}\quad \forall \varphi \in D_h. \end{aligned}$$

This is a space-time finite element discretization with piecewise linear elements in space and in time. The additional \(\sigma \)-term in (28) serves as a stabilization term, which vanishes for \(\vartheta \rightarrow 0\) and is connected to the error term in the trapezoidal rule for the time integral of the third bilinear form in (28). The stability depends significantly on the value of \(\sigma \) with the method being more stable for larger \(\sigma \); e.g., for \(\sigma \ge 1/4\), the method is unconditionally stable and convergent while for \(0\le \sigma <1/4\), a CFL-like condition has to be satisfied to ensure stability as well as convergence, see [15, Thms. 2.1, 3.1] for homogeneous Dirichlet boundary conditions. At the same time, (28) can be formulated as the following time-stepping scheme: Set \(y_h^0=S_0y_0\) and

$$\begin{aligned}&\left( \frac{y_h^1-y_h^0}{\tau },\varphi \right) _{L^2(\Omega )}+\tau ({\hat{E}}u_h\nabla (\sigma y_h^1+(\tfrac{1}{2}-\sigma )y_h^0),\nabla \varphi )_{{\mathbb {L}}^2(\Omega )}\nonumber \\&\qquad =(y_1,\varphi )_{L^2(\Omega )}+\left( \int _0^{t_1}fe_0\,\mathrm {d}t,\varphi \right) _{L^2(\Omega )}, \end{aligned}$$
$$\begin{aligned}&\left( \frac{y_h^{i+1}-2y_h^i+y_h^{i-1}}{\tau },\varphi \right) _{L^2(\Omega )}+\tau \left( {\hat{E}}u_h\nabla (\sigma y_h^{i+1}+(1-2\sigma )y_h^i+\sigma y_h^{i-1}),\nabla \varphi \right) _{{\mathbb {L}}^2(\Omega )}\nonumber \\&\qquad =\left( \int _{t_{i-1}}^{t_{i+1}}fe_i\,\mathrm {d}t,\varphi \right) _{L^2(\Omega )},\quad 1\le i\le N_\tau -1, \end{aligned}$$

for all \(\varphi \in D_h\). For \(\sigma =1/4\), this method is equivalent to the implicit, unconditionally stable, and convergent Crank–Nicolson scheme, while for \(\sigma =0\), the method is explicit if the spatial mass matrix is lumped. The main benefit in our context is that this is an adjoint-consistent discretization and therefore can be used to obtain a conforming discretization of (6) in a straight-forward manner.

Next we introduce the discrete control-to-observation operator \(S_\vartheta :U\cap D_h^c\rightarrow {\mathcal {O}}\) defined by \(u\mapsto By_\vartheta \) where \(y_\vartheta \) is the solution of (28) for the coefficient \(u\in U\cap D_h^c\). Let \(\delta >0\). Then the implicit function theorem implies that \(S_\vartheta \) is Fréchet differentiable on the open subset

$$\begin{aligned} \left\{ u \in L^\infty (\omega _c):-\inf _{x\in \omega _c} {\hat{u}}(x)+u_{\min }< u(x) < u_{\max }+\delta \quad \text {for a.e.}~x\in \omega _c\right\} \cap D_h^c. \end{aligned}$$

This set contains \(U\cap D_h^c\). The implicit function theorem is applicable since the following linearized discrete state equation is well-posed in the variable \(\delta y\in D_\vartheta \) for every \(\delta u_h\in D_h^c\):

$$\begin{aligned}&\int _0^T-(\partial _t \delta y_{\vartheta },\partial _tv)_{H}-(\sigma -\tfrac{1}{6})\tau ^2({\hat{E}}u_h\nabla \partial _t \delta y_{\vartheta },\nabla \partial _tv)_{{\mathbb {L}}^2(\Omega )}+ ({\hat{E}}u_h\nabla \delta y_{\vartheta },\nabla v)_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\nonumber \\&\qquad =\int _0^T(\sigma -\tfrac{1}{6})\tau ^2({\hat{E}}\delta u_h\nabla \partial _t y_{\vartheta },\nabla \partial _tv)_{{\mathbb {L}}^2(\Omega )}- ({\hat{E}}\delta u_h\nabla y_{\vartheta },\nabla v)_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\end{aligned}$$

for all \(v \in D_{\vartheta }\) with \(v(T)=0\) and initial condition \(\delta y_{\vartheta }(0)=0\) as well as \(y_\vartheta =S_\vartheta (u)\). Thus the derivative of \(S_\vartheta \) at u is given by \(S'(u_h):D_h^c\rightarrow {\mathcal {O}},~\delta u_h\mapsto B\delta y_\vartheta \) where \(\delta y_\vartheta \) solves (30) for \(\delta u_h\). Its adjoint (with respect to the \(L^2(Q)\) and \({\mathcal {O}}\) inner product) is given by

$$\begin{aligned} S'(u_h)^*:{\mathcal {O}}\rightarrow C_h^c\, , \qquad o\mapsto \left. \left( \int _0^T(\tfrac{1}{6}-\sigma )\tau ^2\nabla \partial _t y_{\vartheta }\cdot \nabla \partial _tp_\vartheta +\nabla y_{\vartheta }\cdot \nabla p_\vartheta \,\mathrm {d}t\right) \right| _{\omega _c} \end{aligned}$$

where \(y_\vartheta =S_\vartheta (u)\) and \(p_{\vartheta }\) solves the discrete adjoint equation

$$\begin{aligned}&\int _0^T-(\partial _t v,\partial _tp_\vartheta )_{H}-(\sigma -\tfrac{1}{6})\tau ^2({\hat{E}}u_h\nabla \partial _t v,\nabla \partial _tp_\vartheta )_{{\mathbb {L}}^2(\Omega )}+ ({\hat{E}}u_h\nabla v,\nabla p_\vartheta )_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\nonumber \\&\quad = \int _0^T(B^*o,v)_{H}\,\mathrm {d}t\end{aligned}$$

for all \(v \in D_{\vartheta }\) with \(v(0)=0\) and initial condition \(p_{\vartheta }(T)=0\), which can be formulated as a time-stepping scheme similar to (29).

We now introduce the variables \(y_h\) and \(p_h\) defined by

$$\begin{aligned} y_\vartheta (t,x)=\sum _{i=0}^{N_\tau }y_h^i(x)e_i(t),\quad p_\vartheta (t,x)=\sum _{i=1}^{N_\tau }p_h^i(x)e_i(t). \end{aligned}$$

Thus \((S_\vartheta '(u_h))^*o\) with \(o\in {\mathcal {O}}\) has the representation

$$\begin{aligned} (S_\vartheta '(u_h))^*o= \nabla y_h^\top K\nabla p_h, \end{aligned}$$

where K is given by

$$\begin{aligned} K:=(\tfrac{1}{6}-\sigma )\tau ^2A_\tau +M_\tau =\tau \begin{pmatrix} \tfrac{1}{2}-\sigma &{} \sigma &{}&{}\\ \sigma &{} 1-2\sigma &{} \sigma &{}\\ &{}\ddots &{}\ddots &{}\ddots \\ &{}&{}&{}\qquad \sigma \\ &{}&{}\sigma &{}\tfrac{1}{2}-\sigma \end{pmatrix} \end{aligned}$$

with the temporal mass matrix \(M_\tau \) and stiffness matrix \(A_\tau \). For \(\sigma =0\), K is a diagonal matrix.

We now address the discretization of the control costs in the optimization (6). Since a function \(u_h\in D_h^c\) is an element of \(H^1(\omega _c)\) and thus its weak derivatives are piecewise constant on the triangulation of \(\omega _c\), we have

$$\begin{aligned} TV(u_h)=\Vert \nabla u_h\Vert _{L^1(\omega _c)}=\sum _{K\in {\mathcal {T}}_h\cap \omega _c}|(\nabla u_h)|_K|_2. \end{aligned}$$

We furthermore approximate the integral in definition of G by the trapezoidal rule to obtain the discrete multi-bang penalty \(G_h:D_h^c \rightarrow {\overline{{\mathbb {R}}}}\). Using these definitions, we obtain the fully discrete optimization problem

$$\begin{aligned} \min _{u_h\in D_h^c\cap U}\frac{1}{2} \Vert S_\vartheta (u_h)-y_d\Vert _{{\mathcal {O}}}^2+ \alpha G_h(u_h) + \beta TV(u_h). \end{aligned}$$

Note that although (33) is discrete, it is still formulate in function spaces. To apply a minimization algorithm, we now reformulate it in terms of the coefficient vectors for the finite-dimensional functions. First, \(D_h^c\cap U\) can be identified with the set

$$\begin{aligned} U_h=\{\mathbf{u }\in {\mathbb {R}}^{N_c}|~0\le \mathbf{u }_i\le u_{\max }~\forall i=1,\ldots ,N_c\} \end{aligned}$$

through \(u_h=\sum _{x_i\in \overline{\omega _c}}\mathbf{u }_ie_i\). Next we introduce the finite-dimensional subspace \({\mathcal {O}}_h=B(D_\vartheta )\) of \({\mathcal {O}}\) and the discrete control-to-observation on \(U_h\) by \(\mathbf{S }_\vartheta :{\mathbb {R}}^{N_c}\rightarrow {\mathbb {R}}^{N_o}\) with \(N_o=\dim ({\mathcal {O}}_h)\) defined by \(S_\vartheta \). Moreover we define the matrix \(M_{{\mathcal {O}}}\in {\mathbb {R}}^{N_o\times N_o}\) by the mapping \((o_1,o_2)\mapsto (o_1,o_2)_{{\mathcal {O}}}\) for \(o_1,o_2\in {\mathcal {O}}_h\). Thus the inner product and the norm of \({\mathcal {O}}\) in \({\mathcal {O}}_h\) can be identified with \((\mathbf{o }_1,\mathbf{o }_2)_{{\mathcal {O}}_h}=\mathbf{o }_1^\top M_{{\mathcal {O}}}\mathbf{o }_2\) and \(\Vert \mathbf{o }\Vert _{{\mathcal {O}}_h}=((\mathbf{o },\mathbf{o })_{{\mathcal {O}}_h})^{1/2}\) for \(\mathbf{o },~\mathbf{o }_1,~\mathbf{o }_2\in {\mathbb {R}}^{N_o}\). We denote the orthogonal projection onto \({\mathcal {O}}_h\) by \(\pi _{{\mathcal {O}}}\). The operator B restricted to \(D_\vartheta \) can be identified with a matrix \(B_h\in {\mathbb {R}}^{N_o\times N_{\vartheta }}\) for \(N_{\vartheta }=\dim (D_\vartheta )\). Thus \(B^*\) can be identified with \(B_h^\top \). With these identifications, the adjoint operator \(S_\vartheta '(u_h)^*\) for \(u_h\in U\cap D_h^c\) acting on \({\mathcal {O}}_h\) can similarly identified with \(\mathbf{S }_\vartheta '(\mathbf{u })^*:{\mathbb {R}}^{N_o}\rightarrow {\mathbb {R}}^{M_c}\) for some \(\mathbf{u }\in U_h\). Moreover we define the matrix \(A_h\in {\mathbb {R}}^{dM_c\times N_c}\) representing the bilinear form \((\nabla u_h,\xi _h)_{L^2(\Omega )^d}\) for \(u_h\in D_h^c\) and \(\xi _h\in (C_h^c)^d\). Thus we have

$$\begin{aligned} TV(u_h)=\sum _{K\in {\mathcal {T}}_h\cap \omega _c}|(\nabla u_h)|_K|_2=\sum _{K\in {\mathcal {T}}_h\cap \omega _c}|(A_h\mathbf{u })|_K|_2 =: \Vert |A_h \mathbf{u }|_2 \Vert _1=:\mathbf{TV} _h (\mathbf{u }). \end{aligned}$$

Finally, the trapezoidal rule in the definition of \(G_h\) can be expressed in the form of a mass lumping scheme, i.e.,

$$\begin{aligned} G_h(u_h)=\sum _{x_i\in \overline{\omega _c}}d_ig(\mathbf{u }_i)=: \mathbf{G }_h(\mathbf{u }) \end{aligned}$$


$$\begin{aligned} g(t) = {\left\{ \begin{array}{ll} \infty &{} t< u_1,\\ \frac{1}{2} \left( (u_{i}+u_{i+1})t - u_iu_{i+1}\right) &{} t \in [u_i,u_{i+1}],\quad 1\le i <m,\\ \infty &{} t > u_m, \end{array}\right. } \end{aligned}$$

is the scalar multi-bang penalty including the box constraints from [12] and \(d_i=\int _{\omega _c}\varphi _i\,\mathrm {d}x\) are the diagonal entries of the lumped mass matrix, see [46,47,48,49].

Using these notations, we can write (33) equivalently in the form

$$\begin{aligned} \min _{\mathbf{u }\in U_h}\frac{1}{2} \Vert \mathbf{S }_\vartheta (\mathbf{u })-\pi _{{\mathcal {O}}}y_d\Vert _{{\mathcal {O}}_h}^2+ \alpha \mathbf{G }_h(\mathbf{u }) + \beta \mathbf{TV} _h(\mathbf{u }). \end{aligned}$$

Primal–Dual Proximal Splitting

To solve (35), we extend the approach in [31] by applying the nonlinear primal–dual proximal splitting method from [16,17,18] together with a lifting trick. For this purpose, we write (35) (omitting the bold notation and the subscripts denoting vectors and discretizations from now on and assuming that \(y_d\in {\mathcal {O}}_h\)) as

$$\begin{aligned} \min _{u\in U} \frac{1}{2} \Vert S(u) - y_d \Vert _{{\mathcal {O}}}^2 + \beta \Vert |A_hu|_2 \Vert _1 + \alpha G(u). \end{aligned}$$


$$\begin{aligned}&{\mathcal {F}}: {\mathbb {R}}^{N_o} \times {\mathbb {R}}^{2M_c} \rightarrow {\mathbb {R}},&(y,\psi )\mapsto \frac{1}{2}\Vert y-y_d \Vert _{{\mathcal {O}}}^2 + \beta \Vert |\psi |_2 \Vert _1,\\&K:U\rightarrow ({\mathbb {R}}^{N_o} \times {\mathbb {R}}^{2M_c}),&u\mapsto (S(u),A_h u), \end{aligned}$$

we can apply the nonlinear primal–dual proximal splitting algorithm

$$\begin{aligned} \left\{ \begin{aligned} u^{k+1}&= \mathrm {prox}_{\gamma _G \alpha G_h}\left( u^k-\gamma _G K'(u^k)^* \xi ^k\right) \\ {\hat{u}}^{k+1}&= 2 u^{k+1} - u^k\\ \xi ^{k+1}&= \mathrm {prox}_{\gamma _F {\mathcal {F}}^*}\left( \xi ^k + \gamma _F K({\hat{u}}^{k+1})\right) \end{aligned} \right. \end{aligned}$$

for step sizes \(\gamma _F,\gamma _G>0\) satisfying \(\gamma _F\gamma _G\Vert K'(u)^* \Vert <1\). Convergence can be guaranteed under a second-order type condition for K and possibly further restrictions on the step sizes, whose (very technical) verification is outside the scope of this work. Instead, we restrict the discussion here on deriving the explicit form of (36) in the present setting.

First, we endow \(({\mathbb {R}}^{N_o}\times {\mathbb {R}}^{2M_c})\) with the sum of the inner product induced by \(M_{\mathcal {O}}\) (for \({\mathbb {R}}^{N_o}\)) and the Euclidean inner product (for \({\mathbb {R}}^{2M_c}\)). With respect to this inner product, we obtain the adjoint Fréchet derivative

$$\begin{aligned} K'(u)^*(r,\xi ) = S'(u)^*(r) + A_h^T\psi , \end{aligned}$$

where \(S'(u)^*\) is the fully discrete operator corresponding to (32) with right-hand side \(r\in {\mathcal {O}}\) for the adjoint equation.

The proximal point mapping for the (scaled) multi-bang penalty can be obtained by straightforward calculation based on a case differentiation in the definition of the subdifferential, see [31, Prop. 3.6]; for the sake of completeness, we give the short derivation here in full. By the definition of the proximal mapping, \(w=\mathrm {prox}_{\gamma g}(v) = ({{\,\mathrm{\mathrm {Id}}\,}}+ \gamma \partial g)^{-1}(v)\) holds for any \(v\in {\mathbb {R}}\) if and only if \(v \in \{w\} + \gamma \partial g(w)\). Recalling from [13, § 2] that

$$\begin{aligned} \partial g(v) = {\left\{ \begin{array}{ll} (-\infty ,\tfrac{1}{2}(u_1+u_2)] &{} v = u_1,\\ \{\tfrac{1}{2}(u_{i}+u_{i+1})\} &{} v \in (u_{i},u_{i+1}), \quad 1 \le i< m,\\ {[}\tfrac{1}{2}(u_{i-1}+u_i), \tfrac{1}{2}(u_i+u_{i+1}] &{} v = u_i, \quad 1<i<m,\\ {[}\tfrac{1}{2}(u_{m-1}+u_m),\infty ) &{} v = u_m,\\ \emptyset &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$

we now distinguish the following cases for w:

  1. (i)

    \(w=u_1\): In this case,

    $$\begin{aligned} v \in \{w\} + \gamma \left( -\infty ,\tfrac{1}{2}(u_1+u_2)\right] = \left( -\infty ,(1+\tfrac{\gamma }{2}) u_1 + \tfrac{\gamma }{2} u_2\right] . \end{aligned}$$
  2. (ii)

    \(w\in (u_i,u_{i+1})\) for \(1\le i< m\): In this case,

    $$\begin{aligned} v \in \{w\} + \gamma \{\tfrac{1}{2}(u_i + u_{i+1})\}, \end{aligned}$$

    which first can be solved for w to yield

    $$\begin{aligned} w = v - \tfrac{\gamma }{2} (u_i+u_{i+1}); \end{aligned}$$

    inserting this into \(w\in (u_i,u_{i+1})\) and simplifying then gives

    $$\begin{aligned} v \in \left( (1+\tfrac{\gamma }{2})u_i + \tfrac{\gamma }{2} u_{i+1}, \tfrac{\gamma }{2} u_i + (1+\tfrac{\gamma }{2})u_{i+1}\right) . \end{aligned}$$
  3. (iii)

    \(w = u_i\), \(1<i<m\): Proceeding as in the first case, we obtain

    $$\begin{aligned} v \in \left[ \tfrac{\gamma }{2} u_{i-1},(1+\tfrac{\gamma }{2})u_i,(1+\tfrac{\gamma }{2}) u_i + \tfrac{\gamma }{2} u_{i+1}\right] . \end{aligned}$$
  4. (iv)

    \(w=u_m\): Similarly, this implies that

    $$\begin{aligned} v \in \left[ \tfrac{\gamma }{2} u_{m-1},(1+\tfrac{\gamma }{2})u_m,\infty \right) . \end{aligned}$$
Fig. 1

Illustration of the proximal point mapping \(\mathrm {prox}_{\gamma g}\) of the scalar multi-bang penalty g for \((u_1,u_2,u_3)\) = (0, 1, 2) and \(\gamma =0.2\)

Since this is a complete and disjoint case distinction for \(v\in {\mathbb {R}}\), we obtain the proximal mapping for the scalar penalty g, see Fig. 1. By a standard argument, the proximal point mapping for \(G_h\) is thus given componentwise by

$$\begin{aligned}&{[}\mathrm {prox}_{\gamma \alpha G_h}(v)]_j \\&\quad ={\left\{ \begin{array}{ll} v_j-\frac{\alpha \gamma }{2} (u_{i} + u_{i-1}) &{} \text {if } v_j \in \left( \left( 1+\tfrac{\alpha \gamma }{2}\right) u_{i-1}+\tfrac{\alpha \gamma }{2} u_{i},\left( 1+\tfrac{\alpha \gamma }{2}\right) u_{i}+\tfrac{\alpha \gamma }{2} u_{i-1}\right) ,\\ u_i &{}\text {if } v_j\in \left[ \left( 1+\tfrac{\alpha \gamma }{2}\right) u_{i}+\tfrac{\alpha \gamma }{2} u_{i-1},\left( 1+\tfrac{\alpha \gamma }{2}\right) u_{i}+\tfrac{\alpha \gamma }{2} u_{i+1}\right] , \end{array}\right. } \end{aligned}$$

where we have set \(u_{0}=-\infty \) and \(u_{m+1}=\infty \) to avoid the need for further case distinctions. (Note that we compute the proximal mapping with respect to the inner product induced by the lumped mass matrix such that the weight \(d_i\) cancels.)

Finally, for \({\mathcal {F}}\), we first compute the Fenchel conjugate on \({\mathbb {R}}^{N_o}\times {\mathbb {R}}^{2M_c}\) (with respect to the same inner product as above) as

$$\begin{aligned} {\mathcal {F}}^* (r,\psi ) = \frac{1}{2}\Vert r \Vert _{{\mathcal {O}}}^2 + (r,y^d)_{{\mathcal {O}}} + \delta _{\{|\psi |_2\le \beta \}}(\psi ) \end{aligned}$$

to obtain the proximal point mapping (again, with respect to this inner product)

$$\begin{aligned} \mathrm {prox}_{\gamma {\mathcal {F}}^*}(r,\psi ) = \begin{pmatrix} \frac{1}{1+\gamma } (r - \gamma y^d)\\ \mathrm {proj}_{\{|\psi |_2\le \beta \}}(\psi ) \end{pmatrix}, \end{aligned}$$

where the projection can be computed elementwise for each \(K\in {\mathcal {T}}\cap \omega _c\) as

$$\begin{aligned}{}[ \mathrm {proj}_{\{|\psi |_2\le \beta \}}(\psi )]_K = \frac{\beta \psi }{\max \{\beta ,|\psi |_2\}}. \end{aligned}$$

With these, (36) becomes the following explicit algorithm:

$$\begin{aligned} u^{k+1}&= \mathrm {prox}_{\gamma _G\alpha G_h} \left( u^k-\gamma _G S'(u^k)^*r^k - \gamma _G A_h^T\psi ^k\right) ,\\ {\hat{u}}^{k+1}&= 2 u^{k+1} - u^k,\\ y^{k+1}&= S({\hat{u}}^{k+1}),\\ r^{k+1}&= \frac{1}{1+\gamma _F} \left( r^k + \gamma _F(y^{k+1}- y^d)\right) ,\\ q^{k+1}&= \psi ^k + \gamma _F A_h{{\bar{u}}}^{k+1},\\ \psi ^{k+1}&= \mathrm {proj}_{\{|\psi |_2\le \beta \}}(\psi ). \end{aligned}$$

Note that this requires two solutions of the forward wave equation (as well as one solution of the adjoint equation) in each iteration, since \(y^{k+1}\) is based on the extrapolated vector \({{\bar{u}}}^{k+1}\), while the state vector y required for the computation of \(S'(u^{k+1})^*r^{k+1}\) in the following iteration is based on the original update \(u^{k+1}\).

The iteration is terminated based on the residual norm in an equivalent reformulation of the optimality conditions for (33). Combining the approach of Sect. 4 with standard results from convex analysis (see, e.g, [50, 51]), any local minimizer \({{\bar{u}}}\) of (33) together with the corresponding Lagrange multiplier \({\bar{\psi }}\) and the residual \({{\bar{r}}} := S({{\bar{u}}})-y_d\) can be shown to satisfy

$$\begin{aligned} \left\{ \begin{aligned} {{\bar{u}}}&= \mathrm {prox}_{\gamma _G\alpha G_h} \left( {{\bar{u}}}-\gamma _G S'({{\bar{u}}})^*{{\bar{r}}} - \gamma _G A_h^T{\bar{\psi }}\right) ,\\ {{\bar{r}}}&= \frac{1}{1+\gamma _F} \left( {{\bar{r}}} + \gamma _F(S({{\bar{u}}}) - y_d)\right) ,\\ {\bar{\psi }}&= \mathrm {proj}_{\{|\psi |_2\le \beta \}}({\bar{\psi }}). \end{aligned} \right. \end{aligned}$$

For the first equation, which holds in \(U_h\), we measure the residual in the discrete norm induced by the lumped mass matrix as in the definition of \(G_h\). The second equation holds in \({\mathcal {O}}_h\), and hence we measure the residual in the norm induced by the corresponding mass matrix \(M_{\mathcal {O}}\). Finally, the last equation holds in \({\mathbb {R}}^{2N_c}\) so we use the standard Euclidean norm. The iteration is terminated once the sum of these residuals drops below a given tolerance. For the implementation, note that the residual in the first equation for \((u^k,r^k,\psi ^k)\) reduces to \(u^{k}-u^{k+1}\). On the other hand, the residual in the second equation requires an additional solution of the state equation since here S is applied to \(u^k\) instead of the extrapolated \({{\bar{u}}}^k\). In practice, we thus do not evaluate the stopping criterion in every iteration.

Numerical Examples

We now illustrate the above presented approach with two numerical examples. The first is a transmission problem (where waves produced by external forcing pass through the control domain before being observed) loosely motivated by acoustic tomography. The second is a reflection problem (where only reflected, not transmitted, waves are observed) that more closely models seismic inversion. The implementation in Python (using DOLFIN [52, 53], which is part of the open-source computing platform FEniCS [54, 55]) used to generate the following results can be downloaded from https://github.com/clason/tvwavecontrol.

A Model Acoustic Tomography Problem

Fig. 2

transmission example, exact coefficient \(u_e\)

For the first example, we take \(\Omega = (-1,1)\times (-1,2)\) and \(T=3\) and define the control and observation domains

$$\begin{aligned} \omega _c := \{ (x_1,x_2)\in \Omega \mid x_2 \in (0,1) \},\qquad \omega _o := \{ (x_1,x_2)\in \Omega \mid x_2 \in (1,2) \}; \end{aligned}$$

correspondingly, the observation operator is taken as the restriction operator \(By := y|_{{\mathcal {O}}}\) to the observation space \({\mathcal {O}}:= (0,T)\times \omega _o\). The initial conditions are chosen as \((y_0,y_1) = (0,0)\), thus satisfying Assumption 3.1. We now aim to recover a piecewise constant coefficient \(u_e\) with \(u_e(x) \in \{1.0,1.1,1.2,1.3,1.4\}\) almost everywhere; see Fig. 2. Accordingly, we set \({\hat{u}} \equiv 1.0\) and \(u_i = (i-1)/10\), \(i=1,\dots 5\) from noisy observations of the state in \({\mathcal {O}}\). These observations are generated using a source term f that is constructed as a linear combination of point sources which act as Ricker wavelets in time, i.e.,

$$\begin{aligned} f(x,t):= & {} \sum _{i=-9}^9 (\delta _{(i/10,-9/10)}(x) \\&+\,\,\delta _{(0.05+i/10,-8/10)}(x)) 2(1-2(5\pi (t-0.1))^2)e^{-(5\pi (t-0.1))^2}. \end{aligned}$$

(The number and location of source points as well as the amplitude and frequency of the wavelet are chosen such as to obtain a complex enough wave pattern to recover the lateral and depth-wise variations in the coefficient.) The discretization is performed using 64 nodes in each space direction and 128 nodes in time, corresponding to \(h\approx 0.056\) and \(\tau \approx 0.23\). The stabilization constant is set to \(\sigma = 1/4\). The discretized exact solution is then perturbed componentwise by \(10\%\) relative Gaussian noise, i.e., we take

$$\begin{aligned} y_d = B(y(u)) + 0.1\Vert B(y(u)) \Vert _{\infty }\xi , \end{aligned}$$

where \(\xi \) is a vector of independent normally distributed random components with mean 0 and variance 1.

We now compute the reconstruction using the algorithm described in Sect. 5.2, comparing the effects of the total variation and the multi-bang penalty by taking \(\alpha \in \{0,10^{-5}\}\) and \(\beta \in \{0,10^{-4}\}\). In each case, we set the step sizes to \(\gamma _F = 10^{-1}\) and \(\gamma _G = 10^3\) and terminate when the residual norms (evaluated every 10 iterations) drop below \(10^{-6}\). Again, these parameters are chosen to achieve a reasonable reconstruction in as few iterations as possible. (A proper parameter choice rule depending on the noise level and the discretization is left for future work.) The results can be seen in Fig. 3. The case of pure multi-bang regularization (\(\alpha = 10^{-5}\) and \(\beta = 0\), 3680 iterations); see Fig. 3a) shows that indeed \(u(x)\in \{u_1,\dots ,u_5\}\) almost everywhere; however, there is a clear lack of regularity of the reconstruction, which is not surprising as the original function-space problem is not well-posed for \(\beta =0\). In contrast, the reconstruction case of pure \(\mathrm {TV}\) regularization (\(\alpha = 0\) and \(\beta = 10^{-4}\), 1100 iterations; see Fig. 3b) shows a much more regular reconstruction that is constant on large regions; however, these constants are not necessarily from the admissible set \(\{u_1,\dots ,u_5\}\). Finally, combining both multi-bang and total variation regularization (\(\alpha = 10^{-5}\) and \(\beta = 10^{-4}\), 600 iterations; see Fig. 3c) allows recovering more admissible values at the price of penalizing the magnitude of the coefficient value, which prevents the largest value \(u_5=0.4\) from being attained. It is also noteworthy that in this case the tolerance for the residual norm is reached after significantly fewer iterations.

Fig. 3

transmission example, effect of multi-bang penalty (\(\alpha \)) and total variation penalty (\(\beta \)) on the reconstruction

To illustrate the effects of variation of the desired values \(u_i\) on the reconstruction, we recompute the last example with the same parameters \(\alpha ,\beta \) but 10% increased values, i.e., \(u_i=1.1(i-1)/10\), \(i=1,\dots ,5\). The results are shown in Fig. 4, where we repeat the exact coefficient from Fig. 2 with adjusted labels in Fig. 4a for better comparison. As can be seen from Fig. 4b, the reconstruction is similar to that for \(\alpha =0\). In particular, the total variation penalty prevents the misspecified desired values from being enforced strongly. This demonstrates that while misspecified values clearly do not have the same positive influence on the reconstruction, they at least do not have a negative influence.

Fig. 4

transmission example, effect of variation of \(u_i\) on reconstruction

A Model Seismic Inverse Problem

We next consider an example which is inspired from seismic tomography. We assume that the data is given in the form of a time series of mean values of the reflected waves y over certain spatial regions \(O_i\). Thus we define the observation space \({\mathcal {O}}=L^2(I)^m\) for \(m\in {\mathbb {N}}\) and the observation operator

$$\begin{aligned} B:L^2(Q)\rightarrow {\mathcal {O}},\qquad y\mapsto \left( \frac{1}{|O_i|}\int _{O_i}y(\cdot ,x)\,\mathrm {d}x\right) _{i=1}^m, \end{aligned}$$

where the \(O_i\subset \Omega \) are the m spatial observation patches. Furthermore we assume that seismic sources are given by s point sources located on the surface \(\Gamma _s\subset \partial \Omega \) whose magnitudes are time dependent and follow a Ricker wavelet of the form

$$\begin{aligned} f_k(t)=a_k(1-2 \pi ^2 h_k^2 (t-t_k)^2) e^{-\pi ^2 h_k^2 (t-t_k)^2} \end{aligned}$$

with \(h,a,t\in {\mathbb {R}}^s\). This leads to the modified state equation

$$\begin{aligned} \left\{ \begin{aligned} \partial _{tt} y - {\mathrm {div}}(u\nabla y)&= 0&\text { in } Q, \\ {\partial _\nu y}&= \sum _{k=1}^sf_k\delta _{x_k}&\text { on } (0,T) \times \Gamma _s, \\ {\partial _\nu y}&= 0&\text { on } (0,T) \times \partial \Omega \setminus \Gamma _s, \\ y(0)&= 0,\quad {\partial _t} y(0) = 0,&\text { on }\Omega \end{aligned} \right. \end{aligned}$$

with \((x_k)_{k=1}^s\subset \Gamma _s\), \((f_k)_{k=1}^s\subset L^2(I)\), and \(\delta _{x_k}\) the Dirac measure supported on \(x_k\). In our concrete example, we chose \(\Omega =(-1,1)^2\), \(\Gamma _s=(-1,1)\times \{1\}\), and \(T=3\). We set \(\Omega _c=(-1,1)\times (-1,0.7)\). The observation patches are chosen as

$$\begin{aligned} O_i= & {} (o_i,o_i+0.2)\times (0.8,1) \quad \text { with }\quad o_i\in \{-1,-0.8,-0.6,\\&-0.4,-0.2,0,0.2,0.4,0.6,0.8\}. \end{aligned}$$

The sources are located at \(x=(-1+k\cdot 0.1,1)\) with \(k=0,\ldots ,20\). The parameters of the Ricker wavelet are set to \(a_k=2\), \(h_k=5\) and \(t_k=0.1\). The offset \({\hat{u}}\) has the constant value 1. Finally, the exact velocity model is given by

$$\begin{aligned} u_e={\left\{ \begin{array}{ll} 3&{}x\in (0.4,0.6)\times (0.1,0.4),\\ 2&{}x\in (-0.8,-0.5)\times (0.2,0.6),\\ 1&{}x\in (-0.2,0.2)\times (0.3,0.5),\\ 0&{}\text {else,} \end{array}\right. } \end{aligned}$$

for the constant reference coefficient \({\hat{u}} \equiv 1\), cf. Fig. 5a.

The recorded data for our experiments are generated by solving the state equation with the exact velocity model \(u_e\) resulting in the exact state \(y_e\). Then we set \(y_d=By_e+\delta n\) with \(\delta \in [0,1]\). The function \(n\in L^\infty (I)^m\) is a disturbance which models measurement errors and exterior influences. In our case we use a function of the form

$$\begin{aligned} n_k(t):=\eta _k r_k :=\eta _k\sum _{i=1}^M\frac{m_{i,k}}{i}\cos (4\pi t-s_{i,k}\pi ), \end{aligned}$$

where \(M\in {\mathbb {N}}\), \(\eta _k=\frac{\Vert (By_e)_k\Vert _{L^\infty (I)}}{\Vert r_k\Vert _{L^\infty (I)}}\), and \(m_{i,k},s_{i,k}\) are uniform random numbers in [0, 1]. Here we take \(M=10\).

For the discretization, we take a tensorial-based triangular mesh with \(N_h=129^2\), \(N_\tau =129\), and \(\sigma =1/4\). The relative noise level is \(\delta =0.05\). An appropriate regularization parameter is given by \(\beta = 10^{-4}\); for simplicity, we set \(\alpha =0\). The iteration is initialized with \(u_0=0\) and the stepsizes are again chosen as \(\gamma _F=10^{-1}\) and \(\gamma _G=10^3\). The iteration is stopped if the absolute residuum is smaller than \(10^{-4}\); in this experiment, this was reached after 1068 iterations. Figure 6 shows the exact and noisy observations on \(O_1\), \(O_5\) and \(O_{10}\). At the onset, we note two high spikes (a negative and a positive one) which are caused by the source wave initiated on boundary points \(\Gamma _s\). The remaining oscillations are caused by the reflection waves originating from the discontinuities of \(u_e\) and from the reflecting boundary; only these carry information about the coefficient, which makes the reconstruction challenging. The results are shown in Fig. 5b, where each color map is scaled individually to show more details. We observe that the positions of the discontinuities in \(u_e\) that are close to the observation patches are well approximated in \({{\bar{u}}}\) and that the corresponding interfaces are quite sharp. However, the approximation quality of the discontinuities becomes worse farther away from the observation region. This is caused by the fact that reflected waves from lower sections of the discontinuities are more dispersed than the reflected waves from the upper sections of the discontinuities.

Fig. 5

reflection example: results for \(\alpha =0\), \(\beta =10^{-4}\), 1068 iterations (note the different color bars)

Fig. 6

reflection example: exact and noisy observations on \(O_i\), \(i=1,5,10\)


We showed existence of solutions to an optimal control problem for the wave equation with the control entering into the principal part of the operator using total variation regularization and a reformulation of pointwise constraints using a cutoff function. Preferential attainment of a discrete set of control values is incorporated through a multi-bang penalty. We also derived an improved regularity result for solutions of the wave equation under additional natural assumptions on the data and the control, which (for smooth cutoff functions) allows obtaining necessary optimality conditions that can be interpreted in a suitable pointwise fashion. Finally, we demonstrated that the optimal control problem can be solved numerically using a combination of a stabilized finite element discretization and a nonlinear primal–dual proximal splitting algorithm.

This work can be extended in several directions. Besides applying the proposed approach to more realistic models of acoustic tomography or seismic imaging for practical applications, it would be worthwhile to consider the case of boundary observations of the state [56], which however may lead to an unbounded observation operator B. A further challenging goal would be deriving sufficient second-order conditions. Such conditions could then be used for obtaining discretization error estimates for the optimal controls or for showing convergence of the nonlinear primal–dual proximal splitting algorithm based on the “three-point condition” on S from [18].


  1. 1.

    Beilina, L., Clason, C.: An adaptive hybrid FEM/FDM method for an inverse scattering problem in scanning acoustic microscopy. SIAM J. Sci. Comput. 28(1), 382–402 (2006). https://doi.org/10.1137/050631252

    MathSciNet  Article  MATH  Google Scholar 

  2. 2.

    Krautkrämer, J., Krautkrämer, H.: Ultrasonic Testing of Materials, 4th edn. Springer-Verlag, Berlin Heidelberg (1990). https://doi.org/10.1007/978-3-662-10680-8

    Google Scholar 

  3. 3.

    Tarantola, A.: Inversion of seismic reflection data in the acoustic approximation. Geophysics 49(8), 1259–1266 (1984). https://doi.org/10.1190/1.1441754

    Article  Google Scholar 

  4. 4.

    Murat, F.: Contre-exemples pour divers problèmes où le contrôle intervient dans les coefficients. Ann. Mat. Pura Appl. 4(112), 49–68 (1977). https://doi.org/10.1007/BF02413475

    Article  MATH  Google Scholar 

  5. 5.

    Jiang, J.S., Kuo, K.H., Lin, C.K.: On the homogenization of second order differential equations. Taiwan. J. Math. 9(2), 215–236 (2005). https://doi.org/10.11650/twjm/1500407797

    MathSciNet  Article  MATH  Google Scholar 

  6. 6.

    Murat, F., Tartar, L.: \(H\)-convergence. Topics in the mathematical modelling of composite materials. Progr. Nonlinear Differ. Equ. Appl. 31, 21–43 (1997). https://doi.org/10.1007/978-1-4612-2032-9_3

    Article  MATH  Google Scholar 

  7. 7.

    Tartar, L.: The appearance of oscillations in optimization problems. In: Nonclassical Continuum Mechanics (Durham, 1986), London Math. Soc. Lecture Note Ser., vol. 122, pp. 129–150. Cambridge Univ. Press, Cambridge (1987). https://doi.org/10.1017/CBO9780511662911.008

  8. 8.

    Tartar, L.: Homogenization and hyperbolicity. Annali della Scuola Normale Superiore di Pisa - Classe di Scienze Ser. 4, 25(3-4), 785–805 (1997). http://www.numdam.org/item/ASNSP_1997_4_25_3-4_785_0

  9. 9.

    Tartar, L.: The General Theory of Homogenization. Lecture Notes of the Unione Matematica Italiana, vol. 7. Springer, UMI, Berlin, Bologna (2009)

    Google Scholar 

  10. 10.

    Clason, C., Kruse, F., Kunisch, K.: Total variation regularization of multi-material topology optimization. ESAIM 52(1), 275–303 (2018). https://doi.org/10.1051/m2an/2017061

    MathSciNet  Article  MATH  Google Scholar 

  11. 11.

    Clason, C., Kunisch, K.: Multi-bang control of elliptic systems. Ann. Institut Henri Poincaré (C) Anal. Non Linéaire 31(6), 1109–1130 (2014). https://doi.org/10.1016/j.anihpc.2013.08.005

    MathSciNet  Article  MATH  Google Scholar 

  12. 12.

    Clason, C., Kunisch, K.: A convex analysis approach to multi-material topology optimization. ESAIM 50(6), 1917–1936 (2016). https://doi.org/10.1051/m2an/2016012

    MathSciNet  Article  MATH  Google Scholar 

  13. 13.

    Clason, C., Do, T.B.T.: Convex regularization of discrete-valued inverse problems. In: Hofmann, B., Leitão, A., Zubelli, J. (eds.) New Trends in Parameter Identification for Mathematical Models, Trends in Mathematics, pp. 31–51. Springer, Berlin (2018). https://doi.org/10.1007/978-3-319-70824-9_2

    Google Scholar 

  14. 14.

    Gröger, K.: A \(W^{1, p}\)-estimate for solutions to mixed boundary value problems for second order elliptic differential equations. Math. Ann. 283(4), 679–687 (1989). https://doi.org/10.1007/BF01442860

    MathSciNet  Article  MATH  Google Scholar 

  15. 15.

    Zlotnik, A.A.: Convergence rate estimates of finite-element methods for second-order hyperbolic equations. In: Marchuk, G.I. (ed.) Numerical Methods and Applications, pp. 155–220. CRC, Boca Raton, FL (1994)

    Google Scholar 

  16. 16.

    Valkonen, T.: A primal-dual hybrid gradient method for non-linear operators with applications to MRI. Inverse Probl. 30(5), 055,012 (2014). https://doi.org/10.1088/0266-5611/30/5/055012

    MathSciNet  Article  Google Scholar 

  17. 17.

    Clason, C., Valkonen, T.: Primal-dual extragradient methods for nonlinear nonsmooth PDE-constrained optimization. SIAM J. Optim. 27(3), 1313–1339 (2017). https://doi.org/10.1137/16M1080859

    MathSciNet  Article  MATH  Google Scholar 

  18. 18.

    Clason, C., Mazurenko, S., Valkonen, T.: Acceleration and global convergence of a first-order primal-dual method for nonconvex problems. SIAM J. Optim. 29(1), 933–963 (2019). https://doi.org/10.1137/18M1170194

    MathSciNet  Article  MATH  Google Scholar 

  19. 19.

    Bube, K.P.: Convergence of numerical inversion methods for discontinuous impedance profiles. SIAM J. Numer. Anal. 22(5), 924–946 (1985). https://doi.org/10.1137/0722056

    MathSciNet  Article  MATH  Google Scholar 

  20. 20.

    Lavrent’ev Jr., M.M.: An inverse problem for the wave equation with a piecewise-constant coefficient. Sibirsk. Mat. Zh. 33(3), 101–111 (1992). https://doi.org/10.1007/BF00970893. 219

    MathSciNet  Article  MATH  Google Scholar 

  21. 21.

    Aktosun, T., Klaus, M., van der Mee, C.: Integral equation methods for the inverse problem with discontinuous wave speed. J. Math. Phys. 37(7), 3218–3245 (1996). https://doi.org/10.1063/1.531565

    MathSciNet  Article  MATH  Google Scholar 

  22. 22.

    Sedipkov, A.A.: A direct and an inverse problem of acoustic sounding in a stratified medium with discontinuous parameters. Sib. Zh. Ind. Mat. 17(1), 120–134 (2014)

    MathSciNet  MATH  Google Scholar 

  23. 23.

    Stolk, C.C.: On the modeling and inversion of seismic data. Ph.D. thesis, Universiteit Utrecht (2000). https://dspace.library.uu.nl/handle/1874/855

  24. 24.

    Böhm, C.: Efficient inversion methods for constrained parameter identification in full-waveform seismic tomography. Dissertation, Technische Universität München, München (2015). http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:91-diss-20150227-1232040-1-7

  25. 25.

    Goncharsky, A.V., Romanov, S.Y.: A method of solving the coefficient inverse problems of wave tomography. Comput. Math. Appl. 77(4), 967–980 (2019). https://doi.org/10.1016/j.camwa.2018.10.033

    MathSciNet  Article  MATH  Google Scholar 

  26. 26.

    Epanomeritakis, I., Akçelik, V., Ghattas, O., Bielak, J.: A Newton-CG method for large-scale three-dimensional elastic full-waveform seismic inversion. Inverse Probl. 24(3), 034,015, 26 (2008). https://doi.org/10.1088/0266-5611/24/3/034015

    MathSciNet  Article  MATH  Google Scholar 

  27. 27.

    Burstedde, C., Ghattas, O.: Algorithmic strategies for full waveform inversion: \(1\)D experiments. Geophysics 74(6), WCC37–WCC46 (2009). https://doi.org/10.1190/1.3237116

    Article  Google Scholar 

  28. 28.

    Esser, E., Guasch, L., van Leeuwen, T., Aravkin, A., Herrmann, F.: Total variation regularization strategies in full-waveform inversion. SIAM J. Imaging Sci. 11(1), 376–406 (2018). https://doi.org/10.1137/17M111328X

    MathSciNet  Article  MATH  Google Scholar 

  29. 29.

    Yong, P., Liao, W., Huang, J., Li, Z.: Total variation regularization for seismic waveform inversion using an adaptive primal dual hybrid gradient method. Inverse Probl. 34(4), 045,006 (2018). https://doi.org/10.1088/1361-6420/aaaf8e

    MathSciNet  Article  MATH  Google Scholar 

  30. 30.

    Gao, K., Huang, L.: Acoustic- and elastic-waveform inversion with total generalized p-variation regularization. Geophys. J. Int. 218(2), 933–957 (2019). https://doi.org/10.1093/gji/ggz203

    Article  Google Scholar 

  31. 31.

    Do, T.B.T.: Discrete regularization for parameter identification problems. Ph.D. thesis, Faculty of Mathematics, University of Duisburg-Essen (2019). https://doi.org/10.17185/duepublico/70265

  32. 32.

    Hante, F.M., Sager, S.: Relaxation methods for mixed-integer optimal control of partial differential equations. Comput. Optim. Appl. 55(1), 197–225 (2013). https://doi.org/10.1007/s10589-012-9518-3

    MathSciNet  Article  MATH  Google Scholar 

  33. 33.

    Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity Problems. Oxford Mathematical Monographs. The Clarendon Press, Oxford University Press, New York (2000). https://doi.org/10.1007/978-3-0348-8974-2_2

  34. 34.

    Giusti, E.: Minimal Surfaces and Functions of Bounded Variation. Monographs in Mathematics, vol. 80. Birkhäuser Verlag, Basel (1984). https://doi.org/10.1007/978-1-4684-9486-0

  35. 35.

    Ziemer, W.P.: Weakly Differentiable Functions. Graduate Texts in Mathematics, vol. 120. Springer, New York (1989). https://doi.org/10.1007/978-1-4612-1015-3

  36. 36.

    Barbu, V., Precupanu, T.: Convexity and Optimization in Banach Spaces, fourth edn. Springer Monographs in Mathematics. Springer, Dordrecht (2012). https://doi.org/10.1007/978-94-007-2247-7

  37. 37.

    Lions, J.L., Magenes, E.: Non-homogeneous Boundary Value Problems and Applications, vol. I. Springer-Verlag, New York-Heidelberg (1972). https://doi.org/10.1007/978-3-642-65161-8

    Google Scholar 

  38. 38.

    Brezis, H.: Functional Analysis. Sobolev Spaces and Partial Differential Equations. Springer, New York (2010). https://doi.org/10.1007/978-0-387-70914-7

  39. 39.

    Grigor’yan, A.: Heat Kernel and Analysis on Manifolds. AMS/IP Studies in Advanced Mathematics, vol. 47. American Mathematical Society, Providence, RI; International Press, Boston, MA (2009). https://doi.org/10.1090/amsip/047

  40. 40.

    Wloka, J.: Partial Differential Equations. Cambridge University Press, Cambridge (1987). https://doi.org/10.1017/CBO9781139171755. Translated from the German by C. B. Thomas and M. J. Thomas

    Google Scholar 

  41. 41.

    Edwards, R.E.: Functional Analysis. Theory and Applications. Holt, Rinehart and Winston, New York-Toronto-London (1965)

  42. 42.

    Grisvard, P.: Elliptic Problems in Nonsmooth Domains. SIAM, Philadelphia, PA (2011). https://doi.org/10.1137/1.9781611972030. Reprint of the 1985 hardback edition

    Google Scholar 

  43. 43.

    Yosida, K.: Functional Analysis. Grundlehren der Mathematischen Wissenschaften, vol. 123, 6th edn. Springer, Berlin (1980). https://doi.org/10.1007/978-3-662-25762-3

  44. 44.

    Bredies, K., Holler, M.: A pointwise characterization of the subdifferential of the total variation functional (2012). MOBIS SFB-Report 2012-011

  45. 45.

    Chambolle, A., Goldman, M., Novaga, M.: Fine properties of the subdifferential for a class of one-homogeneous functionals. Adv. Calc. Var. 8(1), 31–42 (2015). https://doi.org/10.1515/acv-2012-0025

    MathSciNet  Article  MATH  Google Scholar 

  46. 46.

    Casas, E., Herzog, R., Wachsmuth, G.: Approximation of sparse controls in semilinear equations by piecewise linear functions. Numer. Math. 122(4), 645–669 (2012). https://doi.org/10.1007/s00211-012-0475-7

    MathSciNet  Article  MATH  Google Scholar 

  47. 47.

    Pieper, K.: Finite element discretization and efficient numerical solution of elliptic and parabolic sparse control problems. Dissertation, Technische Universität München, München (2015). http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:91-diss-20150420-1241413-1-4

  48. 48.

    Trautmann, C.P.: Sparse measure-valued optimal control problems governed by wave equations. Dissertation, Karl-Franzens-Universität Graz, Graz (2015). http://resolver.obvsg.at/urn:nbn:at:at-ubg:1-88846

  49. 49.

    Rösch, A., Wachsmuth, G.: Mass lumping for the optimal control of elliptic partial differential equations. SIAM J. Numer. Anal. 55(3), 1412–1436 (2017). https://doi.org/10.1137/16M1074473

    MathSciNet  Article  MATH  Google Scholar 

  50. 50.

    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-9467-7

  51. 51.

    Clason, C.: Nonsmooth Analysis and Optimization (2017). Lecture notes

  52. 52.

    Logg, A., Wells, G.N.: DOLFIN: Automated finite element computing. ACM Trans. Math. Softw. 37(2), 1–28 (2010). https://doi.org/10.1145/1731022.1731030

    MathSciNet  Article  MATH  Google Scholar 

  53. 53.

    Logg, A., Wells, G.N., Hake, J.: DOLFIN: a C++/Python finite element library. In: Logg, A., Mardal, K.A., Wells, G.N. (eds.) Automated Solution of Differential Equations by the Finite Element Method. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-23099-8_10

    Google Scholar 

  54. 54.

    Alnæs, M.S., Blechta, J., Hake, J., Johansson, A., Kehlet, B., Logg, A., Richardson, C., Ring, J., Rognes, M.E., Wells, G.N.: The FEniCS project version 1.5. Arch. Numer. Softw. 3(100), 9–23 (2015). https://doi.org/10.11588/ans.2015.100.20553

    Article  Google Scholar 

  55. 55.

    Logg, A., Mardal, K.A., Wells, G.N. (eds.): Automated Solution of Differential Equations by the Finite Element Method. Lecture Notes in Computational Science and Engineering, vol. 84. Springer (2012). https://doi.org/10.1007/978-3-642-23099-8

  56. 56.

    Feng, X., Lenhart, S., Protopopescu, V., Rachele, L., Sutton, B.: Identification problem for the wave equation with Neumann data input and Dirichlet data observations. Nonlinear Anal. 52(7), 1777–1795 (2003). https://doi.org/10.1016/S0362-546X(02)00295-X

    MathSciNet  Article  MATH  Google Scholar 

Download references


Support by the German Science Fund (DFG) under grant CL 487/1-1 for C.C. and by the ERC advanced grant 668998 (OCLOC) under the EU’s H2020 research program for K.K. and P.T. are gratefully acknowledged.


Open Access funding enabled and organized by Projekt DEAL.

Author information



Corresponding author

Correspondence to Christian Clason.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Clason, C., Kunisch, K. & Trautmann, P. Optimal Control of the Principal Coefficient in a Scalar Wave Equation. Appl Math Optim (2020). https://doi.org/10.1007/s00245-020-09733-9

Download citation


  • Optimal control
  • Wave equation
  • Total variation regularization
  • Primal-dual splitting

Mathematics Subject Classification

  • 49A22
  • 49K20
  • 49M29