1 Introduction and informal statement of the main results

1.1 Introduction

When solving the Helmholtz equation \(\varDelta u +k^2 u=0\) with the h version of the finite-element method (where accuracy is increased by decreasing the meshwidth h while keeping the polynomial degree p constant), h must decrease faster than \(k^{-1}\) to maintain accuracy as k increases; this is the so-called “pollution effect" [4].

A thorough investigation of how quickly h must decrease with the frequency k to maintain accuracy as k increases was performed by Ihlenburg and Babuška in the mid 90’s [70, 71] on the 1-d model problem.

$$\begin{aligned} u'' +k^2 u = -f \quad \text { in }(0,1), \quad u(0)=0 \quad \text { and }\quad u'(1)-\mathrm{i}k u(1)=0. \end{aligned}$$
(1.1)

An explicit expression for the discrete Green’s function for this problem is available, and Ihlenburg and Babuška used this to prove the following two sets of results:

  1. 1.

    The h-FEM is quasi-optimal in the \(H^1\) semi-norm, with quasi-optimality constant independent of k, if \((hk^2/p)\) is sufficiently small; i.e. there exists \(c, C>0\), independent of hk,  and p such that, if \(hk^2/p\le c\), then

    $$\begin{aligned} \left\| \nabla (u- u_h)\right\| _{L^2(0,1)} \le C \min _{v_h\in {{\mathcal {H}}}_h} \left\| \nabla ( u- v_h)\right\| _{L^2(0,1)}, \end{aligned}$$

    where \({{\mathcal {H}}}_h\) is the appropriate conforming subspace of \(H^1(0,1)\) of piecewise polynomials of degree p on meshes of width h, and \(u_h\) is the Galerkin solution; see [70, Theorem 3], [69, Theorem 4.13], [71, Theorem 3.5] (when \(p=1\) this result was proved earlier in [3, Theorem 3.2]). The numerical experiments in [70, Figures 8 and 9] then indicated that, when \(p=1\), the condition “\(hk^2\) sufficiently small" for quasi-optimality is necessary.

  2. 2.

    Under an assumption on the data f (discussed below), the relative error in the h-FEM can be made arbitrarily small by, when \(p=1\), making \(hk^{3/2}\) sufficiently small and, when \(p\ge 2\) and the data is sufficiently smooth (see [69, Remark 4.28]), making \(h^{2p}k^{2p+1}\) sufficiently small. More precisely, [70, Equation 3.25], [71, Theorem 3.7], [69, Equation 4.5.15, §4.6.4, and Theorem 4.27] prove that there exists \(C>0\), independent of h and k (but dependent on p) such that, if hk is sufficiently small, then the Galerkin solution \(u_h\) exists and

    $$\begin{aligned} \frac{\left\| u-u_h\right\| _{H^1_k(0,1)}}{\left\| u\right\| _{H^1_k(0,1)}}\le C \left( \left( \frac{hk}{p}\right) ^p + k \left( \frac{hk}{p}\right) ^{2p}\right) , \end{aligned}$$
    (1.2)

    where the weighted \(H^1\) norm \(\Vert \cdot \Vert _{H^1_k(0,1)}\) is defined by (3.2) below. The numerical experiments in [70, Figure 11], and [69, Figure 4.13] then indicated that, when \(p=1\), the condition “\(h^{2}k^{3}\) sufficiently small" is necessary for the relative error to be bounded (in agreement with the earlier numerical experiments in [8] for small k).

A note on terminology: following [69,70,71], we call the regime in hk,  and p where the solution is quasi-optimal (with constant independent of k) the asymptotic regime, and the regime where the solution is not quasi-optimal the preasymptotic regime. For example, by the results in Points 1 and 2 above, when \(p=1\) the asymptotic regime is when \(hk^2\) is sufficiently small and the preasymptotic regime is when \(hk^2 \gg 1\).

The (asymptotic) quasi-optimality results in Point 1 above have since been generalised to Helmholtz problems in 2 and 3 dimensions (and improved in the case \(p \ge 2\)). Indeed, the fact that the h-FEM with \(p=1\) is quasi-optimal (with constant independent of k) in the full \(H^1_k\) norm when \(hk^2\) is sufficiently small was proved for the homogeneous Helmholtz equation on a bounded domain with impedance boundary conditions in [79, Proposition 8.2.7] (in the case of constant coefficients) and [61, Theorem 4.5 and Remark 4.6(ii)] (in the case of variable coefficients), and for scattering problems with variable coefficients in [50, Theorem 3]. The fact that the h-FEM for \(p \ge 2\) is quasi-optimal when \(h^pk^{p+1}\) is sufficiently small was proved for a variety of constant coefficient Helmholtz problems in [80, Corollary 5.6], [81, Proof of Theorem 5.8], and [51, Theorem 5.1], and for a variety of problems including variable-coefficient Helmholtz problems in [25, Theorem 2.15]; the condition “\(h^pk^{p+1}\) sufficiently small" is indicated to be sharp for quasi-optimality by, e.g., the numerical experiments in [25, §4.4].

In contrast, the (preasymptotic) relative-error bound (1.2) in Point 2 above has not been obtained for any Helmholtz problem in 2 or 3 dimensions, even though numerical experiments indicate that the condition “\(h^{2p}k^{2p+1}\) sufficiently small" is necessary and sufficient for the relative error to be controllably small; see, e.g., [32, Left-hand side of Figure 3]. The closest-available result is that, if \(h^{2p}k^{2p+1}\) is sufficiently small, then

$$\begin{aligned} \left\| u-u_h\right\| _{H^1_k(D)}\le C \left( (hk)^p + k (hk)^{2p}\right) \left\| f\right\| _{L^2(D)}, \end{aligned}$$
(1.3)

for the Helmholtz problem \(\varDelta u +k^2u =-f\) posed in a domain D with either impedance boundary conditions on \(\partial D\) or a perfectly matched layer (PML). Indeed, for the PML problem, (1.3) is proved for \(p=1\) in [76, Theorem 4.4 and Remark 4.5(iv)] and [51, Theorem 5.4]. For the impedance problem, (1.3) is proved for \(p=1\) in [100, Theorem 6.1], for \(p \ge 1\) in [32, Corollary 5.2] (following earlier work by [104]), and for \(p\ge 1\) for the variable-coefficient Helmholtz equation \(\nabla \cdot ({\mathsf {A}}\nabla u) +k^2 nu =-f\) in [87, §2.3] (under a nontrapping condition on \({\mathsf {A}}\) and n).

We highlight that, while [32, 51, 76] all prove results of the form (1.3), all the numerical experiments in these papers consider the relative error (either in the \(H^1\) norm [32, 76], or the weighted \(H^1\) norm (3.2) [51]), illustrating that relative error is indeed the quantity of interest in practice. An analogous situation is encountered in the preasymptotic error analyses of other Helmholtz FEMs in [14, 18, 33,34,35, 44, 101,102,103]: all these papers prove bounds on the error in terms of the data, as in (1.3), but all the numerical experiments in these papers concerning the error consider the relative error.

1.2 The main results of this paper and their novelty

The two main results are the following:

  1. (a)

    Theorem 4.1 proves the relative-error bound (1.2) when \(p=1\) for scattering of a plane wave by a nontrapping obstacle and/or a nontrapping inhomogeneous medium (modelled by the PDE \(\nabla \cdot ({\mathsf {A}}\nabla u) +k^2 nu =0\) with variable \({\mathsf {A}}\) and n) in 2 or 3 dimensions (see Definition 2.2 below for the precise definition of the boundary-value problems considered). As highlighted above, the numerical experiments in [8, 69, 70] show that “\(h^2 k^3\) sufficiently small" is necessary for the relative error of the h-FEM with \(p=1\) to be controllably small (independent of k), and so the result of Theorem 4.1 is the sharp bound to which the title of the paper refers.

  2. (b)

    Theorem 4.2 proves for \(p\ge 2\) a slightly-weaker bound than (1.2), namely that

    $$\begin{aligned} \frac{\left\| u-u_h\right\| _{H^1_k(\varOmega _R)}}{\left\| u\right\| _{H^1_k(\varOmega _R)}}\le C \big (hk + k (hk)^{p+1} \big ) \end{aligned}$$
    (1.4)

    for scattering of a plane wave by a nontrapping obstacle in 2 or 3 dimensions, where C in (1.4) is independent of h and k but depends on p, with this dependence given explicitly in the theorem.

As discussed above, these are the first-ever frequency-explicit relative-error bounds on the Helmholtz h-FEM in 2 or 3 dimensions. We recall the interest (highlighted at the end of the previous subsection) from [14, 18, 32,33,34,35, 44, 51, 76, 100,101,102,103,104] in proving such bounds.

An additional novelty of Theorem 4.1 is that it applies to the variable-coefficient Helmholtz equation, and all the constants in the relative-error bound are explicit, not only in k and h, but also in the coefficients \({\mathsf {A}}\) and n. The only other coefficient-explicit, preasymptotic FEM error bound on the variable-coefficient Helmholtz equation in the literature appears in [87, Theorem 2.39], where the bound (1.3) is proved for the interior impedance problem when \(h^{2p}k^{2p+1}\) is sufficiently small and \({\mathsf {A}}\) and n are nontrapping. The only other coefficient-explicit FEM error bounds for the Helmholtz equation with variable \({\mathsf {A}}\) and n are in [50, 61]. Both prove quasi-optimality under the condition “\(hk^2\) sufficient small" when \(p=1\), with [61, Theorems 4.2 and 4.5] proving this result for the interior impedance problem and [50, Theorem 3] proving this result for scattering by a nontrapping Dirichlet obstacle.

Our two main results, Theorems 4.1 and 4.2, are proved for a particular class of Helmholtz problems, namely those corresponding to scattering by a plane wave, and not for the equation \(\varDelta u +k^2u =-f\) with general \(f\in L^2\). We highlight that, for this latter class of problems, it is unreasonable to expect a relative-error bound such as (1.2) to hold, and thus the best one can do is prove bounds for a particular class of realistic data (as we do here). For example, consider the 1-d problem (1.1) with

$$\begin{aligned} f(x) := -\big [ \exp (\mathrm{i}k^n x) \chi (x)\big ]'' - k^2\big [ \exp (\mathrm{i}k^n x) \chi (x)\big ], \end{aligned}$$
(1.5)

where \(\chi \) has compact support in (0, 1). The solution to (1.1) is then \(u(x) = \exp (\mathrm{i}k^n x)\chi (x)\), which oscillates on a scale of \(k^{-n}\), i.e., a smaller scale than \(k^{-1}\) when \(n>1\). The finite-element method with, say, \(p=1\) and \(hk^{3/2}\) small (and independent of k) will therefore not resolve this solution, and hence a bound such as (1.2) does not hold. This example is nevertheless consistent with the previous results recalled in §1.1 since (i) the assumptions on the solution u in [70, First equation in §3.4] and [71, Definition 3.2] exclude such data f, and (ii) with f given by (1.5), \(\Vert f\Vert _{L^2(0,1)} \sim k^{2n}\) and \(\Vert u\Vert _{H^1_k(0,1)}\sim k^n\), so that \(\Vert f\Vert _{L^2(0,1)} \gg \Vert u\Vert _{H^1_k(0,1)}\), and the error estimate (1.3) holds in this case because, although the absolute error on left-hand side of (1.3) is large, the right-hand side of (1.3) is larger.

1.3 Discussion of these results in the context of using semiclassical analysis in the numerical analysis of the Helmholtz equation

In the last \(\sim \)10 years, there has been growing interest in using results about the k-explicit analysis of the Helmholtz equation from semiclassical analysis (a branch of microlocal analysis) to design and analyse numerical methods for the Helmholtz equation.Footnote 1 The activity has so far occurred in, broadly speaking, five different directions:

  1. 1.

    The use of the results in [83] (on the rigorous \(k\rightarrow \infty \) asymptotics of the solution of the Helmholtz equation in the exterior of a smooth convex obstacle with strictly positive curvature) to design and analyse k-dependent approximation spaces for integral-equation formulations [2, 31, 36, 38, 39, 53, 74, 75],

  2. 2.

    The use of the results in [83], along with those in [72] on scattering from several convex obstacles, to analyse algorithms for multiple scattering problems [1, 11, 37, 40].

  3. 3.

    The use of bounds on the Helmholtz solution operator (also known as resolvent estimates) due to [86, 99] (with the latter using the propagation of singularities results in [82]) to prove k-explicit bounds on both inverses of boundary-integral operators and the inf-sup constant of the domain-based variational formulation [7, 22, 23, 91], and also to analyse preconditioning strategies [52].

  4. 4.

    The use of identities introduced in [86] to prove coercivity of boundary-integral operators [94] and to introduce new coercive formulations of Helmholtz problems [30, 55, 56, 85, 93].

  5. 5.

    The use of bounds on the restriction of quasimodes of the Laplacian to hypersurfaces from [17, 27, 64, 95,96,97] to prove sharp k-explicit bounds on boundary integral operators [48, 63, Appendix A], [45, 49], with these bounds then used to prove sharp k-explicit bounds on the number of iterations when GMRES is applied to boundary-integral equations [47].

The results of the present paper include a sixth direction. Namely, a key ingredient in our proofs of Theorems 4.1 and 4.2 (indeed, the ingredient that allows one to obtain a relative-error bound instead of a bound in terms of the data, such as (1.3)) is a result describing the oscillatory behaviour of the solution of the plane-wave scattering problem, which we prove using semiclassical defect measures. These measures describe where the mass in phase space of a Helmholtz solution is concentrated in the high-frequency limit (see the discussion in §9.1 below), and were introduced in [57, 77]; see [15] for more discussion on the history of defect measures.

2 Formulation of the problem

Assumption 2.1

(Assumptions on the domain and coefficients)

  1. (i)

    \(\varOmega _- \subset {\mathbb {R}}^d, d=2,3,\) is a bounded open Lipschitz set such that its open complement \(\varOmega _+:= {\mathbb {R}}^d\setminus \overline{\varOmega _-}\) is connected.

  2. (ii)

    \({\mathsf {A}}\in C^{0,1}(\varOmega _+ , {\mathsf {SPD}})\) (where \({\mathsf {SPD}}\) is the set of \(d\times d\) real, symmetric, positive-definite matrices) is such that \(\mathrm{supp}({\mathsf {I}}- {\mathsf {A}})\) is compact in \({\mathbb {R}}^d\) and there exist \(0<A_{\min }\le A_{\max }<\infty \) such that, for all \(\varvec{\xi }\in {\mathbb {R}}^d\),

    $$\begin{aligned} A_{\min }|\varvec{\xi }|^2 \le \varvec{\xi }^T\big ({\mathsf {A}}({\mathbf {x}})\varvec{\xi }\big ) \le A_{\max }|\varvec{\xi }|^2 \quad \text { for almost every }{\mathbf {x}}\in \varOmega _+. \end{aligned}$$
    (2.1)
  3. (iii)

    \(n\in L^\infty (\varOmega _+,{\mathbb {R}})\) is such that \(\mathrm{supp}(1-n)\) is compact in \({\mathbb {R}}^d\) and there exist \(0<n_{\min }\le n_{\max }<\infty \) such that

    $$\begin{aligned} n_{\min } \le n({\mathbf {x}})\le n_{\max }\quad \text { for almost every } {\mathbf {x}}\in \varOmega _+. \end{aligned}$$
    (2.2)

Figure 1 shows a schematic of \(\varOmega _-\) and the supports of \({\mathsf {I}}-{\mathsf {A}}\) and \(1-n\). Let the scatterer \(\varOmega _{\mathrm{sc}}\) be defined by \(\varOmega _{\mathrm{sc}}:= \varOmega _-\cup \mathrm{supp}({\mathsf {I}}- {\mathsf {A}}) \cup \mathrm{supp}(1-n)\) (i.e., the union of the shaded areas in Fig. 1). Given \(R>0\) such that \(\varOmega _{\mathrm{sc}}\subset B_R\), where \(B_R\) denotes the ball of radius R about the origin, let \(\varOmega _R:= \varOmega _+\cap B_R\). Let \(\varGamma _R:= \partial B_R\) and let \(\varGamma :=\partial \varOmega _-\). Let \({\mathbf {n}}\) denote the outward-pointing unit normal vector field on both \(\varGamma \) and \(\varGamma _R\). We denote by \(\partial _{{\mathbf {n}}}\) the corresponding Neumann trace on \(\varGamma \) or \({\varGamma _R}\) and \(\partial _{{\mathbf {n}},{\mathsf {A}}}\) the corresponding conormal-derivative trace. We denote by \(\gamma u\) the Dirichlet trace on \(\varGamma \) or \({\varGamma _R}\).

Fig. 1
figure 1

A schematic of \(\varOmega _-\), the supports of \({\mathsf {I}}-{\mathsf {A}}\) and \(1-n\), and \(B_R\)

Definition 2.2

(Helmholtz plane-wave scattering problem) Given \(k>0\) and \({\mathbf {a}}\in {\mathbb {R}}^d\) with \(|{\mathbf {a}}|=1\), let \(u^I({\mathbf {x}}) := \mathrm{e}^{\mathrm{i}k {\mathbf {x}}\cdot {\mathbf {a}}}.\) Given \(\varOmega _-\), \({\mathsf {A}}\), and n, as in Assumption 2.1, we say \(u\in H^1_{\mathrm{{loc}}}(\varOmega _+)\) satisfies the Helmholtz plane-wave scattering problem if

$$\begin{aligned} \nabla \cdot ({\mathsf {A}}\nabla u)+k^2 n u =0 \quad \text { in }\varOmega _+,\quad \text {either}\quad \gamma u =0 \quad \text { or }\quad \partial _{{\mathbf {n}}, {\mathsf {A}}} u =0 \quad \text { on }\varGamma , \end{aligned}$$
(2.3)

and \(u^S:= u-u^I\) satisfies the Sommerfeld radiation condition

$$\begin{aligned} \frac{\partial u^S}{\partial r}({\mathbf {x}}) - \mathrm{i}k u^S({\mathbf {x}}) = o \left( \frac{1}{r^{(d-1)/2}}\right) \end{aligned}$$
(2.4)

as \(r:= |{\mathbf {x}}|\rightarrow \infty \), uniformly in \({\widehat{{\mathbf {x}}}}:= {\mathbf {x}}/r\).

We call a solution of the Helmholtz equation satisfying the Sommerfeld radiation condition (2.4) an outgoing solution (so, in Definition 2.2, \(u^S\) is outgoing).

Define \(\mathrm{DtN}_k: H^{1/2}(\varGamma _R) \rightarrow H^{-1/2}(\varGamma _R)\) to be the Dirichlet-to-Neumann map for the equation \(\varDelta u+k^2 u=0\) posed in the exterior of \(B_R\) with the Sommerfeld radiation condition (2.4). When \(\varGamma _R= \partial B_R\), for some \(R>0\), the definition of \(\mathrm{DtN}_k\) in terms of Hankel functions and polar coordinates (when \(d=2\))/spherical polar coordinates (when \(d=3\)) is given in, e.g., [80, Equations 3.7 and 3.10]. Let

$$\begin{aligned} H_{0,D}^1(\varOmega _R):= \big \{ v\in H^1(\varOmega _R) : \gamma v=0 \text { on }\varGamma \big \}. \end{aligned}$$

When Dirichlet boundary conditions are prescribed in (2.3), let

$$\begin{aligned} {{\mathcal {H}}}:= H_{0,D}^1(\varOmega _R); \end{aligned}$$
(2.5)

when Neumann boundary conditions are prescribed, let

$$\begin{aligned} {{\mathcal {H}}}:= H^1(\varOmega _R). \end{aligned}$$
(2.6)

Lemma 2.3

(Variational formulation of the Helmholtz plane-wave scattering problem) With \(u^I\), \(\varOmega _-\), \({\mathsf {A}}\), n, \(\varOmega _R\), and \({{\mathcal {H}}}\) as above, define \({\widetilde{u}}\in {{\mathcal {H}}}\) as the solution of the variational problem

$$\begin{aligned} \text { find } {\widetilde{u}}\in {{\mathcal {H}}}\text { such that }\quad a({\widetilde{u}},v)=F(v) \quad \text { for all }v\in {{\mathcal {H}}}, \end{aligned}$$
(2.7)

where

$$\begin{aligned} a({\widetilde{u}},v)&:= \int _{\varOmega _R} \Big (({\mathsf {A}}\nabla {\widetilde{u}})\cdot \overline{\nabla v} - k^2 n {\widetilde{u}}{\overline{v}}\Big ) - \big \langle \mathrm{DtN}_k(\gamma {\widetilde{u}}),\gamma v\big \rangle _{{\varGamma _R}},\quad \text { and }\quad \nonumber \\ F(v)&:= \int _{\varGamma _R}\left( \partial _{{\mathbf {n}}}u^I - \mathrm{DtN}_k(\gamma u^I)\right) \overline{\gamma v}. \end{aligned}$$
(2.8)

where \(\langle \cdot ,\cdot \rangle _{{\varGamma _R}}\) denotes the duality pairing on \({\varGamma _R}\) that is linear in the first argument and antilinear in the second. Then \({\widetilde{u}}= u|_{\varOmega _R}\), where u is the solution of the Helmholtz plane-wave scattering problem of Definition 2.2.

For a proof of Lemma 2.3, see, e.g., [60, Lemma 3.3]. From here on we denote the solution of the variational problem (2.7) by u, so that u satisfies

$$\begin{aligned} \quad a(u,v)=F(v) \quad \text { for all }v\in {{\mathcal {H}}}. \end{aligned}$$
(2.9)

Lemma 2.4

The solution of the Helmholtz plane-wave scattering problem of Definition 2.2 exists and is unique.

Proof

Uniqueness follows from the unique continuation principle; see [60, §1], [61, §2] and the references therein. Since \(a(\cdot ,\cdot )\) satisfies a Gårding inequality (see (10.6) below), Fredholm theory then gives existence. \(\square \)

The h finite-element method Let \({{\mathcal {T}}}_h\) be a family of triangulations of \(\varOmega _R\) (in the sense of, e.g., [28, Page 61]) that is shape regular (see, e.g., [12, Definition 4.4.13], [28, Page 128]). When Neumann boundary conditions are prescribed in (2.3), let

$$\begin{aligned} {{\mathcal {H}}}_{h}:= \{v \in C(\overline{\varOmega _R}) : v|_K \text { is a polynomial of degree { p} for each } K \in {{\mathcal {T}}}_{h}\}; \end{aligned}$$
(2.10)

when Dirichlet boundary conditions are prescribed we impose the additional condition that elements of \({{\mathcal {H}}}_{h}\) are zero on \(\varGamma \); in both cases we then have \({{\mathcal {H}}}_{h}\subset {{\mathcal {H}}}\). The main results, Theorems 4.1 and 4.2 below require \(\varGamma \) to be at least \(C^{1,1}\). For such \(\varOmega _R\) it is not possible to fit \(\partial \varOmega _R\) exactly with simplicial elements (i.e. when each element of \({{\mathcal {T}}}_{h}\) is a simplex), and fitting \(\partial \varOmega _R\) with isoparametric elements (see, e.g, [28, Chapter VI]) or curved elements (see, e.g., [9]) is impractical. Some analysis of non-conforming error is therefore necessary, but since this is very standard (see, e.g., [12, Chapter 10]), we ignore this issue here.

The second main result, Theorem 4.2 (for \(p \ge 2\) and analytic \(\varGamma \)), requires the triangulation \({{\mathcal {T}}}_h\) to be quasi-uniform in the particular sense of [81, Assumption 5.1]. Triangulations satisfying this assumption can be constructed by refining a fixed triangulation that has analytic element maps; see [81, Remark 5.2].

The finite-element method for the variational problem (2.7) is the Galerkin method applied to the variational problem (2.7), i.e.

$$\begin{aligned} \text { find } u_h\in {{\mathcal {H}}}_h\text { such that }\,\, a(u_h,v_h)=F(v_h) \,\, \text { for all }v_h\in {{\mathcal {H}}}_h. \end{aligned}$$
(2.11)

Observe that setting \(v=v_h\) in (2.9) and combining this with (2.11) we obtain the Galerkin orthogonality that

$$\begin{aligned} a(u-u_h,v_h) =0 \quad \text { for all }v_h\in {{\mathcal {H}}}_h. \end{aligned}$$
(2.12)

3 Definitions of quantities involved in the statement of the main results

Throughout the paper we assume that \(R\ge R_0>0\) for some fixed \(R_0>0\) and \(k\ge k_0\) for some fixed \(k_0>0\). For simplicity we assume throughout that

$$\begin{aligned} k_0 R_0\ge 1 \quad \text { and }\quad hk\le 1. \end{aligned}$$
(3.1)

Given a bounded open set D, we let the weighted \(H^1\) norm, \(\Vert \cdot \Vert _{H^1_k}\) be defined by

$$\begin{aligned} \Vert u\Vert ^2_{H^1_k(D)} :=\left\| \nabla u\right\| _{L^2(D)}^2 + k^2 \left\| u\right\| _{L^2(D)}^2. \end{aligned}$$
(3.2)

We now define quantities \({C_{\mathrm{DtN}}}_j, j=1,2, {C_{\mathrm{sol}}}, C_{\mathrm{{osc}}}, {C_{\mathrm{PF}}}, {C_{H^2}}, {C_{\mathrm{int}}},\) and \(C_{\mathrm{MS}}\) that appear in the main results (Theorems 4.1 and 4.2). All of these are dimensionless quantities, independent of k, h, and p, but dependent on one or more of \({\mathsf {A}}\), n, \(\varOmega _-\) (indicated below).

\({C_{\mathrm{DtN}}}_j\), \(j=1,2\) By [80, Lemma 3.3], there exist \({C_{\mathrm{DtN}}}_j= {C_{\mathrm{DtN}}}_j(k_0 R_0)\), \(j=1,2,\) such that

$$\begin{aligned} \big |\big \langle \mathrm{DtN}_k(\gamma u), \gamma v\rangle _{\varGamma _R}\big \rangle \big | \le {C_{\mathrm{DtN}}}_1 \left\| u\right\| _{H^1_k(\varOmega _R)} \left\| v\right\| _{H^1_k(\varOmega _R)} \end{aligned}$$
(3.3)

for all \(u,v \in H^1(\varOmega _R)\) and for all \(k\ge k_0\), and

$$\begin{aligned} - \mathfrak {R}\big \langle \mathrm{DtN}_k\phi ,\phi \big \rangle _{{\varGamma _R}} \ge {C_{\mathrm{DtN}}}_2 R^{-1}\left\| \phi \right\| ^2_{L^2({\varGamma _R})} \end{aligned}$$
(3.4)

for all \(\phi \in H^{1/2}({\varGamma _R})\) and for all \(k\ge k_0.\)

\({C_{\mathrm{sol}}}\) We assume that \({\mathsf {A}}\), n, and \(\varOmega _-\) are nontrapping in the sense that there exists \({C_{\mathrm{sol}}}={C_{\mathrm{sol}}}({\mathsf {A}},n, {\varOmega _-}, R,k_0)\) such that, given \(f\in L^2(\varOmega _R)\), the solution of the boundary value problem (BVP)

$$\begin{aligned}&\nabla \cdot ({\mathsf {A}}\nabla v)+k^2 n v =-f \,\,\text { in }\varOmega _+,\qquad \text {either}\,\, \gamma v =0 \,\,\text { or }\,\, \partial _{{\mathbf {n}}, {\mathsf {A}}} v =0 \,\,\text { on }\varGamma , \end{aligned}$$

and v satisfies the Sommerfeld radiation condition (2.4) (with \(u^S\) replaced by v), satisfies the bound

$$\begin{aligned} \left\| v\right\| _{H^1_k(\varOmega _R)} \le {C_{\mathrm{sol}}}R\left\| f\right\| _{L^2({\varOmega _+})} \quad \text { for all }k\ge k_0; \end{aligned}$$
(3.5)

observe that the factor R on the right-hand side makes \({C_{\mathrm{sol}}}\) dimensionless. (Remark 4.5 discusses the situation where this nontrapping assumption is removed and \({C_{\mathrm{sol}}}\) depends on k.) This assumption holds if the obstacle \(\varOmega _-\) and the coefficients \({\mathsf {A}}\) and n are nontrapping in the sense that all billiard trajectories (or, more precisely, Melrose–Sjöstrand generalized bicharacteristics [68, Section 24.3]) starting in an exterior neighbourhood of \(\varOmega _-\) and evolving according to the Hamiltonian flow defined by the symbol of (2.3) escape from that neighbourhood after some uniform time. For this flow to be well-defined, \(\varGamma \) must be \(C^\infty \), and \({\mathsf {A}}\) and n must be globally \(C^{1,1}\) and \(C^\infty \) in a neighbourhood of \(\varGamma \); note that the flow may in general be set-valued rather than unique in cases where the boundary is permitted to be infinite-order flat. Assuming the uniqueness of the flow, an explicit expression for \({C_{\mathrm{sol}}}\) in terms of \({\mathsf {A}}, n, {\varOmega _-},\) and R is then given in [50, Theorems 1 and 2, and Equation 6.32]. However, the bound (3.5) can be established in situations with much less smoothness; indeed, [60, Theorems 2.5, 2.7, and 2.19] establishes (3.5) for a Dirichlet \(C^0\) star-shaped obstacle and \(L^\infty \) \({\mathsf {A}}\) and n satisfying certain monotonicity assumptions. Furthermore, our arguments in the rest of the paper do not need the flow to be well-defined on \(\varOmega _{\mathrm{sc}}:= \varOmega _-\cup \mathrm{supp}({\mathsf {I}}- {\mathsf {A}}) \cup \mathrm{supp}(1-n)\), they only require that the bound (3.5) holds. We can therefore define nontrapping in this weaker sense, and work with scatterers of much lower smoothness than in standard microlocal-analysis settings.

\(C_{\mathrm{{osc}}}\) By Theorem 9.1 below, if \({\mathsf {A}}\), n, and \(\varOmega _-\) are nontrapping then there exists \(C_{\mathrm{{osc}}}=C_{\mathrm{{osc}}}({\mathsf {A}}, n, \varOmega _-)\) (‘osc’ standing for ‘oscillation’) such that for u a solution to the Helmholtz plane-wave scattering problem of Definition 2.2,

$$\begin{aligned} \vert u\vert _{H^2(\varOmega _R)} \le C_{\mathrm{{osc}}}k\left\| u\right\| _{H^1_k(\varOmega _R)}, \end{aligned}$$
(3.6)

where \(\vert \cdot \vert _{H^2(\varOmega _R)}\) denotes the \(H^2\) semi-norm; i.e. \(\vert u\vert _{H^2(\varOmega _R)}:= \sum _{|\alpha |=2}\int _{\varOmega _R}|\partial ^\alpha u|^2\).

\({C_{\mathrm{PF}}}\) By [12, §5.3], [98, Corollary A.15], there exists \({C_{\mathrm{PF}}}= {C_{\mathrm{PF}}}(\varOmega _-)\) (‘PF’ standing for ‘Poincaré–Friedrichs’) such that

$$\begin{aligned} R^{-2} \left\| v\right\| ^2_{L^2(\varOmega _R)} \le {C_{\mathrm{PF}}}\left( R^{-1}\left\| \gamma v\right\| ^2_{L^2({\varGamma _R})} + \left\| \nabla v\right\| ^2_{L^2(\varOmega _R)}\right) \end{aligned}$$
(3.7)

for all \(v\in H^1(\varOmega _R)\).

\({C_{H^2}}\) By Theorem 6.1 below, there exists \({C_{H^2}}={C_{H^2}}({\mathsf {A}}, \varOmega _-)\) such that, if \(f\in L^2(\varOmega _{R})\) and \(v\in H^1(\varOmega _{R})\) satisfy

$$\begin{aligned}&\nabla \cdot ({\mathsf {A}}\nabla v) = -f \,\, \text { in }\varOmega _{R},\qquad \partial _{\mathbf {n}}v = \mathrm{DtN}_k(\gamma v)\text { on }{\varGamma _R}, \text { and }\end{aligned}$$
(3.8a)
$$\begin{aligned}&\qquad \text { either } \,\,\gamma v =0 \,\,\text { or }\,\, \partial _{\mathbf {n}}v =0 \quad \text { on }\varGamma , \end{aligned}$$
(3.8b)

then

$$\begin{aligned} \big |v\big |_{H^2(\varOmega _R)} \le {C_{H^2}}\left( \left\| f\right\| _{L^2(\varOmega _R)} + R^{-1}\left\| \nabla v\right\| _{L^2(\varOmega _R)} +R^{-2} \left\| v\right\| _{L^2(\varOmega _R)}\right) . \end{aligned}$$
(3.9)

The key point in (3.9) is that, although v in (3.8) depends on k via the boundary condition on \({\varGamma _R}\), \({C_{H^2}}\) is independent of k.

\({C_{\mathrm{int}}}\) By, e.g., [12, Equation 4.4.28], [90, Theorem 4.1] the nodal interpolant \(I_h: C(\overline{\varOmega _R}) \rightarrow {{\mathcal {H}}}_h\) is well-defined for functions in \(H^2(\varOmega _R)\) (for \(d=2,3\)) and satisfies

$$\begin{aligned} \left\| v- I_hv\right\| _{L^2(\varOmega _R)} + h\left\| \nabla (v- I_hv)\right\| _{L^2(\varOmega _R)} \le {C_{\mathrm{int}}}h^2\vert v\vert _{H^2(\varOmega _R)} , \end{aligned}$$
(3.10)

for all \(v\in H^2(\varOmega _R)\), for some \({C_{\mathrm{int}}}\) that depends only on the shape-regularity constant of the mesh. As a consequence of (3.10), the definition of \(\Vert \cdot \Vert _{H^1_k(\varOmega _R)}\) (3.2), and the assumption that \(hk\le 1\) (3.1), we have

$$\begin{aligned} \left\| v-I_h v\right\| _{H^1_k(\varOmega _R)}\le \sqrt{2} {C_{\mathrm{int}}}h |v|_{H^2(\varOmega _R)}. \end{aligned}$$
(3.11)

\(C_{\mathrm{MS}}\) By [81, Lemma 3.4 and Proposition 5.3] there exists \(C_{\mathrm{MS}}= C_{\mathrm{MS}}(\varOmega _-)\) (‘MS’ standing for ‘Melenk–Sauter’) so that, if \(\varGamma \) is analytic, \({\mathsf {A}}={\mathsf {I}}\), \(n=1\), and \(\varOmega _+\) is nontrapping, then the bound (8.6) below holds.

In §1.2 we recalled that the only other frequency- and coefficient-explicit FEM error bounds for the variable-coefficient Helmholtz equation appear in [61, Theorems 4.2 and 4.5], [50, Theorem 3], and [87, Theorem 2.39]. We note here that the constants in these bounds are expressed in terms of analogous quantities to those defined above.

4 Statement and discussion of the main results

4.1 The main results

The first theorem holds for any \(p \ge 1\), but is most relevant in the case \(p=1\).

Theorem 4.1

Let u be the solution of the Helmholtz plane-wave scattering problem (Definition 2.2). Assume that both Assumption 2.1 and (3.1) hold, \(\varOmega _-\) is \(C^{1,1}\), and \({\mathsf {A}},\) n,  and \(\varOmega _-\) are nontrapping. If \(p\ge 1\) and

$$\begin{aligned} h^2 k^3 \le C_1, \end{aligned}$$
(4.1)

then the Galerkin solution \(u_h\) to the variational problem (2.11) exists, is unique, and satisfies the bound

$$\begin{aligned} \frac{\left\| u-u_h\right\| _{H^1_k(\varOmega _R)} }{ \left\| u\right\| _{H^1_k(\varOmega _R)} } \le C_2 hk + C_3 h^2 k^3, \end{aligned}$$
(4.2)

where

$$\begin{aligned} C_1 :&= \frac{1}{4 (A_{\max } + {C_{\mathrm{DtN}}}_1)n_{\max }({C_{H^2}})^2({C_{\mathrm{int}}})^2{C_{\mathrm{sol}}}R } \left( n_{\max } + \frac{1}{k_0 R_0 {C_{\mathrm{sol}}}} + 2\right) ^{-1}\\&\quad \times \bigg ( 1 + \frac{\sqrt{2}}{ \min \big \{ {C_{\mathrm{DtN}}}_2 ({C_{\mathrm{PF}}})^{-1} \, ,\, A_{\min }(1+{C_{\mathrm{PF}}})^{-1}\big \} }\bigg )^{-1},\\ C_2&:= \frac{\sqrt{2} {C_{\mathrm{int}}}C_{\mathrm{{osc}}}}{A_{\min }}\big (\max \big \{A_{\max }, n_{\max }\big \} + {C_{\mathrm{DtN}}}_1\big ), \end{aligned}$$

and

$$\begin{aligned} C_3&:= \frac{4\sqrt{2}}{\sqrt{A_{\min }}} \big ( A_{\max } + {C_{\mathrm{DtN}}}_1\big )({C_{\mathrm{int}}})^2 {C_{H^2}}{C_{\mathrm{sol}}}RC_{\mathrm{{osc}}}\sqrt{n_{\max }+ A_{\min }}\\&\quad \times \left( n_{\max } + \frac{1}{k_0R_0 {C_{\mathrm{sol}}}} + 2\right) . \end{aligned}$$

Theorem 4.2

Let u be the solution of the Helmholtz plane-wave scattering problem (Definition 2.2). Assume that both Assumption 2.1 and (3.1) hold, \({\mathsf {A}}={\mathsf {I}}\), \(n=1\), \(\varOmega _-\) is a nontrapping Dirichlet obstacle, \(\varGamma \) is analytic, and the triangulation \({{\mathcal {T}}}_h\) in the definition of \({{\mathcal {H}}}_h\) (2.10) satisfies the quasi-uniformity assumption [81, Assumption 5.1]. If

$$\begin{aligned} \frac{(hk)^2}{p} + {C_{\mathrm{sol}}}R \frac{k(hk)^{p+1}}{p^p} \le {\widetilde{C}}_1 \end{aligned}$$
(4.3)

then the Galerkin solution \(u_h\) to the variational problem (2.11) exists, is unique, and satisfies the bound

$$\begin{aligned} \frac{ \left\| u-u_h\right\| _{H^1_k(\varOmega _R)} }{ \left\| u\right\| _{H^1_k(\varOmega _R)} } \le \left( {\widetilde{C}}_2+ \frac{{\widetilde{C}}_3 C_{\mathrm{MS}}}{p}\right) hk + {\widetilde{C}}_3 C_{\mathrm{MS}}{C_{\mathrm{sol}}}R \,\frac{k(hk)^{p+1}}{p^p}, \end{aligned}$$
(4.4)

where

$$\begin{aligned} {\widetilde{C}}_1:= & {} \frac{1}{ 2\sqrt{2}(1+ {C_{\mathrm{DtN}}}_1) {C_{H^2}}C_{\mathrm{MS}}}\bigg ( 1 + \frac{\sqrt{2}}{ \min \big \{ {C_{\mathrm{DtN}}}_2 ({C_{\mathrm{PF}}})^{-1} ,(1+{C_{\mathrm{PF}}})^{-1}\big \} }\bigg )^{-1},\\ {\widetilde{C}}_2:= & {} \sqrt{2} {C_{\mathrm{cont}}}{C_{\mathrm{int}}}C_{\mathrm{{osc}}},\quad \text { and }\quad {\widetilde{C}}_3:= 4\big (1 + {C_{\mathrm{DtN}}}_1\big ) {C_{\mathrm{int}}}C_{\mathrm{{osc}}}. \end{aligned}$$

Observe that (i) the condition (4.3) is satisfied if \(h^{p+1}k^{p+2}\) is sufficiently small, and (ii) the bound (4.4) is of the form (1.4).

The result of Theorem 4.2 might appear not to be a high-order result, since the lowest-order terms in (4.3) and (4.4) are \(h^2\) and h, respectively. Nevertheless, for fixed p, if \(k(hk)^{p+1}\) is sufficiently small, so that (4.3) is satisfied, then

$$\begin{aligned} h \sim k^{-1-1/(p+1)} \quad \text { so }\quad hk \sim k^{-1/(p+1)} \ll 1 \quad \text { as }k\rightarrow \infty , \end{aligned}$$

and the dominant term on the right-hand side of (4.4) is that involving \(k(hk)^{p+1}\). We highlight that Theorem 4.2, along with the previous work discussed in §1.1, shows that high-order methods suffer less from the pollution effect than low-order methods.

4.2 How the main results are proved

Theorems 4.1 and 4.2 are proved using the so-called elliptic-projection argument or modified duality argument, used to prove the bound (1.3) on the solution in terms of the data. We first make some remarks about the history of this argument, and then outline our new contributions.

Recall that the classic duality argument, coming out of ideas introduced in [89], proves quasi-optimality of the Helmholtz FEM, and was used in, e.g., [3, 24, 25, 50, 51, 61, 70, 79,80,81, 88]. The elliptic-projection argument is a modification of this argument that allows one to prove results in the preasymptotic regime (as opposed to the asymptotic regime). The initial ideas were introduced in the Helmholtz context in [42, 43] for interior-penalty discontinuous Galerkin methods, and then further developed for the standard FEM and continuous interior-penalty methods in [100, 104]. The argument has been subsequently used by [6, 24, 32, 51, 76, 101] (see, e.g., the literature review in [87, §2.3]).

We note that [43, 100] also used an error-splitting argument (with this idea called “stability-error iterative improvement" in these papers), and that error splitting ideas were also used in [32], together with the idea of using discrete Sobolev norms in the duality argument. Although we do not use these ideas in this paper, one expects that they could be used to improve the p dependence in Theorem 4.2, but see [87, Remark 2.48] for a discussion on the challenges in doing this.

Our three new contributions to the elliptic-projection argument are (i) a rigorous proof, using semiclassical defect measures, of the bound (3.6) describing the oscillatory behaviour of the solution of the plane-wave scattering problem (see Theorem 9.1 below), (ii) the proof of \(H^2\) regularity, with constant independent of k, of the solution of Poisson’s equation with the boundary condition \(\partial _{\mathbf {n}}v= \mathrm{DtN}_k(\gamma v)\) (see (3.9) and Theorem 6.1), and (iii) determining how all the constants in the elliptic-projection argument depend on \({\mathsf {A}}, n, \varOmega _-,\) and R.

Regarding (i): oscillatory behaviour similar to (3.6) of Helmholtz solutions has been an assumption in many analyses of finite- and boundary-element methods; see, e.g., [70, First equation in §3.4], [71, Definition 3.2], [13, Definition 4.6], [5, Definition 3.5], [30, Assumption 3.4]. However, to our knowledge, the only existing rigorous results proving such behaviour are [59, Theorems 1.1 and 1.2] and [47, Theorem 1.11(c)]. These results concern the Neumann trace of the solution of the Helmholtz plane-wave scattering problem with \({\mathsf {A}}={\mathsf {I}}\) and \(n=1\), and are then used in [59] and [47] to analyse boundary-element methods applied to this problem. In common with (3.6), these results are obtained using semiclassical-analysis techniques.

Regarding (ii): the analogous result (\(H^2\) regularity with constant independent of k) for Poisson’s equation with the impedance boundary condition \(\partial _{\mathbf {n}}v=\mathrm{i}k \gamma v\) is central to the elliptic-projection argument for the Helmholtz equation with impedance boundary conditions. This result was explicitly assumed in [43, Lemma 4.3], implicitly assumed in [6, 24, 100, 104], and recently proved in [26]. Our proof of (3.9) uses (and makes \({\mathsf {A}}\)-explicit) arguments from [26], which in turn use results from [62], adapting them to deal with the operator \(\mathrm{DtN}_k\), instead of \(\mathrm{i}k\), in the boundary condition.

Regarding (iii): while the standard duality argument applied to the Helmholtz equation discussed above has recently been made explicit in \({\mathsf {A}}, n,\) and \(\varOmega _-\) in [50, 61] (as discussed in §1.2), the only places in the literature where the elliptic-projection argument is made explicit in \({\mathsf {A}}, n,\) and \(\varOmega _-\) are the present paper and [87, §2.3], leading to the coefficient-explicit preasymptotic error bounds on the Helmholtz FEM at high-frequency in Theorem 4.1 and [87, Theorem 2.39]. One area in which we expect these results to be applied is in the analysis of uncertainty quantification (UQ) algorithms for the high-frequency Helmholtz equation with random coefficients, as discussed in the following remark.

Remark 4.3

(The importance of coefficient-explicit FEM results for Helmholtz UQ) To analyse UQ algorithms that use the standard Helmholtz variational formulation, one needs to understand how existence and uniqueness of the Galerkin solution is affected by the randomness in the coefficients. One therefore needs coefficient-explicit existence and uniqueness results for the Galerkin solution for the Helmholtz equation with variable (deterministic) coefficients (such as in Theorem 4.1 and [87, Theorem 2.39]); this issue is highlighted (but not fully analysed) in the analysis of Monte Carlo and Multi-level Monte Carlo methods in [87, Chapter 5]; see [87, Assumption 5.1 and Remark 5.2].

The only other analyses of uncertainty quantification (UQ) algorithms for the high-frequency Helmholtz equation with random coefficients in the literature are [41, 54] (concerning Monte Carlo and Quasi-Monte Carlo methods, respectively). Because of the issue described in the previous paragraph, these papers use formulations of the Helmholtz equation where existence and uniqueness of the Galerkin solution is established for all khp,  and for a class of (deterministic) coefficients ([41] uses the interior-penalty discontinuous-Galerkin method of [42, 43, 54] uses the coercive formulation of [56]). This then ensures that the Galerkin solution exists and is unique for all realisations of the random coefficients; see the discussion at the beginning of [41, §4].

4.3 Why does Theorem 4.2 not cover scattering by an inhomogeneous medium?

In both the elliptic-projection argument and the standard duality argument, a key role is played by the quantity \(\eta ({{\mathcal {H}}}_h)\) defined by (8.3) below, which describes how well solutions of the (adjoint of the) Helmholtz equation can be approximated in \({{\mathcal {H}}}_h\).

In the case \(p=1\) we estimate \(\eta ({{\mathcal {H}}}_h)\) using \(H^2\) regularity of the solution (which holds when \({\mathsf {A}}\) and \(\varOmega _-\) satisfy the assumptions of Theorem 4.1), leading to the bound (8.5) below. When \(p\ge 1\), \({\mathsf {A}}={\mathsf {I}}\), \(n=1\), \(\varOmega _-\) is a Dirichlet obstacle, and \(\varGamma \) is analytic, [81] proved the bound (8.6) on \(\eta ({{\mathcal {H}}}_h)\), and we use this result to prove Theorem 4.2. The bound (8.6) was proved via a judicious splitting of the solution [81, Theorem 4.20] into an analytic but oscillating part, and an \(H^2\) part that behaves “well" for large frequencies, and this splitting is only available for the exterior Dirichlet problem with \({\mathsf {A}}={\mathsf {I}}\) and \(n=1\).

We highlight that an alternative splitting procedure valid for Helmholtz problems with variable coefficients was recently developed in [25], leading to an alternative proof of the bound on \(\eta ({{\mathcal {H}}}_h)\) (8.6) [25, Lemma 2.13]. However, this alternative procedure requires that \(\mathrm{DtN}_k\) be approximated by \(\mathrm{i}k\) on \({\varGamma _R}\). Indeed, in [25, Proof of Lemma 2.13] the solution is expanded in powers of k, i.e. \(u = \sum _{j=0}^{\infty } k^j u_j\), and then on \({\varGamma _R}\) one has \(\partial _{\mathbf {n}}u_{j+1} = \mathrm{i}\gamma u_j\); this relationship between \(u_{j+1}\) and \(u_j\) on \({\varGamma _R}\) no longer holds if \(\mathrm{DtN}_k\) is not approximated by \(\mathrm{i}k\).

4.4 Approximating \(\mathrm{DtN}_k\)

Implementing the operator \(\mathrm{DtN}_k\) is computationally expensive, and so in practice one seeks to approximate this operator by either imposing an absorbing boundary condition on \({\varGamma _R}\), or using a PML. In this paper we follow the precedent established in [80, 81] of, when proving new results about the FEM for exterior Helmholtz problems, first assuming that \(\mathrm{DtN}_k\) is realised exactly. We remark, however, that if the two key ingredients in §4.2 (a proof of the oscillatory behaviour (3.6) and \(H^2\)-regularity, independent of k, of a Poisson problem) can be established when \(\mathrm{DtN}_k\) is replaced by an absorbing boundary condition on \({\varGamma _R}\), then the result of Theorem 4.1 carry over to this case. When an impedance boundary condition (i.e. the simplest absorbing boundary condition) is imposed on \({\varGamma _R}\), the necessary Poisson \(H^2\)-regularity result is proved in [26], but we discuss below in Remark 9.9 the difficulties in proving (3.6) in this case.

4.5 Removing the nontrapping assumption

The only place in the proofs of Theorems 4.1 and 4.2 where the nontrapping assumption (i.e. the fact that \({C_{\mathrm{sol}}}\) in (3.5) is independent of k) is used is in the proof of the bound (3.6) (in Theorem 9.1 below). We sketch in Remark 9.10 below how (3.6) can be proved in the trapping case (i.e. when \({C_{\mathrm{sol}}}\) is not independent of k); the rest of the proofs of Theorems 4.1 and 4.2 then go through as before. In the case of Theorem 4.1, the requirement for the relative error to be bounded independently of k would then be that \(h^2 k^3 {C_{\mathrm{sol}}}\) be sufficiently small. Under the strongest form of trapping, \({C_{\mathrm{sol}}}\) can grow exponentially through a sequence of ks [10, §2.5], but is bounded polynomially in k if a set of frequencies of arbitrarily-small measure is excluded [73, Theorem 1.1]. However, it is not clear how sharp the requirement “\(h^2 k^3 {C_{\mathrm{sol}}}\) sufficiently small" for the relative error to be bounded is in these cases.

5 Outline of the proof

As highlighted in §4.2, one of the novelties of this paper is that it makes the elliptic-projection argument explicit in the coefficients \({\mathsf {A}}\) and n. However, this explicitness means that many of the expressions in the proofs are complicated (in the same way as the expressions in the results in Theorems 4.1 and 4.2 are complicated). In this section therefore, we give an outline of the proof, keeping track of the dependence on kh, and p, but ignoring the dependence on \({\mathsf {A}}, n, \varOmega _-\), and R. We use the notation \(a\lesssim b\) when \(a\le Cb\) with C independent of kh,  and p, but dependent on \({\mathsf {A}},n,\varOmega _-,\) and R.

As in the standard duality argument coming out of ideas introduced in [89] and then formalised in [88], our starting point is the fact that, since \(a(\cdot ,\cdot )\) satisfies the Gårding inequality (10.6), Galerkin orthogonality (2.12) and continuity of \(a(\cdot ,\cdot )\) (10.4) imply that, for any \(v_h\in {{\mathcal {H}}}_h\),

$$\begin{aligned}&A_{\min } \left\| u-u_h\right\| ^2_{H^1_k(\varOmega _R)} \le \mathfrak {R}a(u-u_h, u-v_h) + k^2 \big (n_{\max } + A_{\min }\big )\left\| u-u_h\right\| ^2_{L^2(\varOmega _R)}\nonumber \\&\quad \le {C_{\mathrm{cont}}}\left\| u-u_h\right\| _{H^1_k(\varOmega _R)} \left\| u-v_h\right\| _{H^1_k(\varOmega _R)} + k^2 \big (n_{\max } + A_{\min }\big )\left\| u-u_h\right\| ^2_{L^2(\varOmega _R)}.\nonumber \\ \end{aligned}$$
(5.1)

Recall (from, e.g., [88, Theorem 2.5], [80, Theorem 4.3], [92, Theorem 6.32]) that the standard duality argument (related to the Aubin-Nitsche trick) shows that

$$\begin{aligned} \left\| u-u_h\right\| _{L^2(\varOmega _R)} \le {C_{\mathrm{cont}}}\eta ({{\mathcal {H}}}_h) \left\| u-u_h\right\| _{H^1_k(\varOmega _R)}, \end{aligned}$$
(5.2)

where \(\eta ({{\mathcal {H}}}_h)\), defined by (8.3) below, describes how well solutions of the adjoint problem are approximated in the space \({{\mathcal {H}}}_h\). Inputting (5.2) into (5.1) one obtains quasi-optimality, with constant independent of k, if \(k \eta ({{\mathcal {H}}}_h)\) is sufficiently small. Lemma 8.2 below shows that \(\eta ({{\mathcal {H}}}_h)\lesssim h+(hk)^p\), and thus the condition “\(k \eta ({{\mathcal {H}}}_h)\) sufficiently small" is satisfied if \(h^pk^{p+1}\) is sufficiently small.

In contrast, the elliptic-projection argument, which we follow, shows that

$$\begin{aligned} \left\| u-u_h\right\| _{L^2(\varOmega _R)} \lesssim \eta ({{\mathcal {H}}}_h) \left\| u-w_h\right\| _{H^1_k(\varOmega _R)} \quad \text { for all }w_h \in {{\mathcal {H}}}_h, \end{aligned}$$
(5.3)

provided that \(hk^2 \eta ({{\mathcal {H}}}_h)\) is sufficiently small (see Lemma 10.1 below). Observe that (5.3) is a stronger bound than (5.2), since \(w_h\) on the right-hand side of (5.3) is arbitrary. The proof of (5.3) in our setting of the plane-wave scattering problem requires the new Poisson \(H^2\)-regularity bound (3.9), which we prove in Theorem 6.1 below.

Inputting (5.3) into (5.1), choosing \(w_h= v_h\), and using the inequality

$$\begin{aligned} 2\alpha \beta \le \varepsilon \alpha ^2 + \varepsilon ^{-1} \beta ^2 \quad \text { for all }\alpha ,\beta ,\varepsilon >0, \end{aligned}$$
(5.4)

on the first term on the right-hand side of (5.1), we obtain that, if \(hk^2 \eta ({{\mathcal {H}}}_h)\) is sufficiently small, then, for any \(v_h\in {{\mathcal {H}}}_h\),

$$\begin{aligned} \left\| u-u_h\right\| ^2_{H^1_k(\varOmega _R)}&\lesssim \big (1 + k^2 (\eta ({{\mathcal {H}}}_h))^2 \big )\left\| u-v_h\right\| ^2_{H^1_k(\varOmega _R)}; \end{aligned}$$

i.e. quasi-optimality. Assuming \(H^2\) regularity of the solution, and using (3.11), we obtain that, if \(hk^2 \eta ({{\mathcal {H}}}_h)\) is sufficiently small, then

$$\begin{aligned} \left\| u-u_h\right\| _{H^1_k(\varOmega _R)}^2 \lesssim \big (1 + k^2 (\eta ({{\mathcal {H}}}_h))^2 \big )h^2 |u|^2_{H^2(\varOmega _R)}. \end{aligned}$$
(5.5)

In the standard elliptic-projection argument (see, e.g., [24, §5.5]) applied to the PDE \(\varDelta u + k^2 u=- f\), an \(H^2\)-regularity bound similar to (3.5) and the nontrapping bound (3.5) are combined to give \(|u|_{H^2(\varOmega _R)}\lesssim k \Vert f\Vert _{L^2(\varOmega _R)}\), and combining this with both (5.5) and the bound \(\eta ({{\mathcal {H}}}_h) \lesssim hk\) (see (8.5) below) proves the bound (1.3) with \(p=1\) on the Galerkin error in terms of the data when \(h^2k^3\) is sufficiently small.

In contrast, in this paper we prove, using semiclassical defect measures, that the solution to the plane-wave scattering problem satisfies (3.6), i.e. \(|u|_{H^2(\varOmega _R)}\lesssim k \left\| u\right\| _{H^1_k(\varOmega _R)}\), (see Theorem 9.1 below), and using this in (5.5), along with the bounds on \(\eta ({{\mathcal {H}}}_h)\) in Lemma 8.2, we obtain the relative-error bounds (4.2) and (4.4).

In summary, once one has proved the bound (3.6) (which we do via semiclassical analysis) and the Poisson \(H^2\)-regularity bound (3.9) (which we do using results from [62] and properties of \(\mathrm{DtN}_k\)), if one ignores the technicalities of making the argument explicit in \({\mathsf {A}}\), n, \(\varOmega _-\), and R, then the proof of a preasymptotic relative-error bound follows via a straightforward modification of the elliptic-projection argument. Given the large and sustained interest (reviewed in §1.1) in preasymptotic relative-error bounds for the Helmholtz FEM, we believe this fact illustrates the advantage of approaching the numerical analysis of the Helmholtz equation from a perspective encompassing both numerical-analysis and semiclassical-analysis techniques.

6 Proof of the Poisson \(H^2\)-regularity result (3.9)

Theorem 6.1

With \({\mathsf {A}}\), \(\varOmega _-\), \(\varGamma \), and \(\varOmega _R\) as in §2, let \(v\in H^1(\varOmega _R)\) be the solution of the Poisson boundary value problem (3.8). If \(\varGamma \) is \(C^{1,1}\), then \(v\in H^2(\varOmega _R)\) and the bound (3.9) holds.

We follow the recent proof of the related regularity result [26, Theorem 3.1] (where \(\mathrm{DtN}_k\) is replaced by \(\mathrm{i}k\), \({\mathsf {A}}={\mathsf {I}}\), and \(\varOmega _-=\emptyset \)) and start by recalling results from [62].

Lemma 6.2

Let D be a bounded, convex, open set of \({\mathbb {R}}^{n}\) with \(C^{2}\) boundary. Then, for all \({\mathbf {v}}\in H^1(D; {\mathbb {C}}^d)\),

$$\begin{aligned} \int _{D}\bigg (|\nabla \cdot {\mathbf {v}}|^{2}-\sum _{i,j=1}^{n}\int _{D}\frac{\partial v_i}{\partial x_j}\overline{\frac{\partial v_j}{\partial x_i}}\,\bigg ) \ge -2\mathfrak {R}\big \langle (\gamma {\mathbf {v}})_{T},\nabla _{T}(\gamma {\mathbf {v}}\cdot {\mathbf {n}})\big \rangle _{\partial D}, \end{aligned}$$
(6.1)

where \(\nabla _T\) is the surface gradient on \(\partial D\) and \((\gamma {\mathbf {v}})_T:= \gamma {\mathbf {v}}- {\mathbf {n}}(\gamma {\mathbf {v}}\cdot {\mathbf {n}})\) is the tangential component of \(\gamma {\mathbf {v}}\).

Proof

The result with \({\mathbf {v}}\) real follows from [62, Theorem 3.1.1.1] and the fact that the second fundamental form of \(\partial D\) (defined in, e.g., [62, §3.1.1]) is non-positive (see [62, Proof of Theorem 3.1.2.3]). The result with \({\mathbf {v}}\) complex follows in a straightforward way by repeating the argument in [62, Theorem 3.1.1.1] for complex \({\mathbf {v}}\). \(\square \)

Lemma 6.3

([62, Lemma 3.1.3.4]) If \({\mathsf {A}}\in C^{0,1}(D,{\mathsf {SPD}})\) satisfies (2.1) (with \(\varOmega _+\) replaced by D), then, for all \(v\in H^2(D)\),

$$\begin{aligned} (A_{\min })^{2}\sum _{i,j=1}^{d}\left| \frac{\partial ^{2}v}{\partial x_{i}\partial x_{j}}\right| ^{2}\le \sum _{i,j,\ell ,m=1}^{d}A_{i \ell }A_{j m}\frac{\partial ^{2}v}{\partial x_{j}\partial x_{\ell }}\frac{\partial ^{2}{\overline{v}}}{\partial x_{i}\partial x_{m}}. \end{aligned}$$
(6.2)

As a first step to proving Theorem 6.1, we prove it in the case when \(\varOmega _-=\emptyset \).

Lemma 6.4

Let \({\mathsf {A}}\in C^{0,1}(B_R,{\mathsf {SPD}})\) satisfy (2.1) (with \(\varOmega _+\) replaced by \(B_R\)) and be such that \(\mathrm{supp}({\mathsf {I}}-{\mathsf {A}}) \subset \subset B_R\). Given \(f\in L^{2}(B_{R})\), let \(v\in H^{1}(B_{R})\) be the solution of

$$\begin{aligned} \nabla \cdot ({\mathsf {A}}\nabla v) = -f \,\, \text { in }B_{R},\qquad \partial _{\mathbf {n}}v = \mathrm{DtN}_k(\gamma v)\text { on }{\varGamma _R}. \end{aligned}$$
(6.3)

Then \(v\in H^2(B_R)\) and

$$\begin{aligned} \vert v\vert _{H^{2}(B_R)} ^2&\le \, \frac{2}{(A_{\min })^ 2} \bigg [\Vert f\Vert _{L^{2}(B_R)} ^2 + \bigg (d ^4 \Vert \nabla {\mathsf {A}}\Vert _{L^{\infty }(B_R)} ^2\\&\quad +\,\frac{2}{(A_{\min })^2}d^8\Vert {\mathsf {A}}\Vert _{L^{\infty }(B_R)}^{2} \Vert \nabla {\mathsf {A}}\Vert _{L^{\infty }(B_R)}^{2} \bigg )\Vert \nabla v \Vert _{L^2(B_R)} ^ 2 \bigg ], \end{aligned}$$

where \(\nabla {\mathsf {A}}\) denotes the derivative of \({\mathsf {A}}\).

Proof

Let \(w\in H^1({\mathbb {R}}^d)\) be the outgoing solution of the following transmission problem

$$\begin{aligned} \nabla \cdot ({\mathsf {A}}\nabla w ) =- f \quad \text { in }B_R,\qquad \varDelta w + k^2 w =0 \quad&\text { in }{\mathbb {R}}^d\setminus \overline{B_R},\\ \gamma w_+ = \gamma w_- \,\,\text { and }\,\, \partial _{\mathbf {n}}w_+ = \partial _{\mathbf {n}}w_- \quad&\text { on }{\varGamma _R}, \end{aligned}$$

where \(w_-:= w|_{B_R}\) and \(w_+ := w|_{{\mathbb {R}}^d\setminus B_R}\). (Note that it is important here that \({\mathsf {A}}={\mathsf {I}}\) in a neighbourhood of \({\varGamma _R}\), so that \(\partial _{{\mathbf {n}},{\mathsf {A}}} w_-= \partial _{\mathbf {n}}w_-\).) By the definition of the operator \(\mathrm{DtN}_k\), \(w_-= v\). Since \({\varGamma _R}\) is \(C^2\), the regularity result [29, Theorem 5.2.1 and §5.4b] implies that \(w_- \in H^2(B_R)\) and \(w_+ \in H^2_{\mathrm{loc}}({\mathbb {R}}^d\setminus \overline{B_R})\); therefore \(v\in H^2(B_R)\).

Since \(v \in H^2(B_R)\) and \({\mathsf {A}}\) is Lipschitz, \({\mathsf {A}}\nabla v \in H^1(B_R)\) and we can apply Lemma 6.2 with \({\mathbf {v}}:={\mathsf {A}}\nabla v\). Since \({\mathsf {A}}={\mathsf {I}}\) near \({\varGamma _R}\), \({\mathbf {v}}= \nabla v\) near \({\varGamma _R}\) and so the right-hand side of (6.1) becomes

$$\begin{aligned} -2\mathfrak {R}\big \langle \nabla _T (\gamma v), \nabla _T (\partial _{{\mathbf {n}}} v) \big \rangle _{\varGamma _R} =-2 \mathfrak {R}\big \langle \nabla _T (\gamma v), \nabla _T (\mathrm{DtN}_k(\gamma v))\big \rangle _{\varGamma _R}, \end{aligned}$$

where we have used the boundary condition in (6.3).

Now, \(\mathrm{DtN}_k\) and \(\nabla _T\) commute on \({\varGamma _R}\); this can be seen either by rotation invariance, or by using the definition of \(\mathrm{DtN}_k\) and \(\nabla _T\) in terms of Fourier series on \({\varGamma _R}\). Therefore, the inequality (3.4) implies that the right-hand side of (6.1) is non-negative, hence

$$\begin{aligned} \sum _{i,j,\ell ,m=1}^{d}\int _{B_R}\frac{\partial }{\partial x_{j}}\left( A_{i\ell }\frac{\partial v}{\partial x_{\ell }}\right) \frac{\partial }{\partial x_{i}}\left( A_{j m}\frac{\partial {\overline{v}}}{\partial x_{m}}\right) \le \Vert f\Vert _{L^{2}(B_R)}^2. \end{aligned}$$
(6.4)

The left-hand side of (6.4) equals

$$\begin{aligned} \sum _{i,j,\ell ,m=1}^d\int _{\varOmega }A_{i\ell }A_{j m}\frac{\partial ^{2}v}{\partial x_{j}\partial x_{\ell }}\frac{\partial ^{2}{\overline{v}}}{\partial x_{i}\partial x_{m}}+\sum _{i,j,\ell ,m=1}^d\int _{\varOmega }R_{i,j,\ell ,m}, \end{aligned}$$
(6.5)

where

$$\begin{aligned} R_{i,j,\ell ,m}&=\frac{\partial A_{i\ell }}{\partial x_{j}}\frac{\partial v}{\partial x_{\ell }}A_{j m}\frac{\partial ^{2}{\overline{v}}}{\partial x_{i}\partial x_{m}} +A_{i\ell }\frac{\partial ^{2}v}{\partial x_{j}\partial x_{\ell }}\frac{\partial A_{j m}}{\partial x_{i}}\frac{\partial {\overline{v}}}{\partial x_{m}}\\&\quad +\frac{\partial A_{i\ell }}{\partial x_{j}}\frac{\partial v}{\partial x_{\ell }}\frac{\partial A_{j m}}{\partial x_{i}}\frac{\partial {\overline{v}}}{\partial x_{m}}\\&=:R_{i,j,\ell ,m}^{1}+R_{i,j,\ell ,m}^{2}+R_{i,j,\ell ,m}^{3}. \end{aligned}$$

By the Cauchy-Schwarz inequality

$$\begin{aligned} \left| \int _{B_R}R_{i,j,\ell ,m}^{1}\right| +\left| \int _{B_R}R_{i,j,\ell ,m}^{2}\right| \le 2\Vert {\mathsf {A}}\Vert _{L^{\infty }(B_R)}\Vert \nabla {\mathsf {A}}\Vert _{L^{\infty }(B_R)}\Vert \nabla v\Vert _{L^{2}(B_R)}\vert v\vert _{H^2(B_R)} \end{aligned}$$

and

$$\begin{aligned} \left| \int _{B_R}R_{i,j,\ell ,m}^{3}\right| \le \Vert \nabla {\mathsf {A}}\Vert _{L^{\infty }(B_R)}^{2}\Vert \nabla v\Vert _{L^{2}(B_R)}^{2}. \end{aligned}$$

We therefore obtain

$$\begin{aligned} \bigg |\sum _{i,j,\ell ,m=1}^d\int _{B_R}R_{i,j,\ell ,m}\bigg |&\le 2d^{4}\Vert {\mathsf {A}}\Vert _{L^{\infty }(B_R)}\Vert \nabla {\mathsf {A}}\Vert _{L^{\infty }(B_R)}\Vert \nabla v\Vert _{L^{2}(B_R)}\vert v\vert _{H^{2}(B_R)}\\&\qquad +d^{4}\Vert \nabla {\mathsf {A}}\Vert _{L^{\infty }(B_R)}^{2}\Vert \nabla v\Vert _{L^{2}(B_R)}^{2}. \end{aligned}$$

Combining this with (6.2), (6.4), and (6.5), we obtain

$$\begin{aligned} (A_{\min })^{2}\vert v\vert _{{H}^{2}(B_R)}^{2}\le&\,\Vert f\Vert _{L^{2}(B_R)}^{2}+2d^{4}\Vert {\mathsf {A}}\Vert _{L^{\infty }(B_R)}\Vert \nabla {\mathsf {A}}\Vert _{L^{\infty }(B_R)}\Vert \nabla v\Vert _{L^{2}(B_R)}\vert v\vert _{H^{2}(B_R)}\\&+\,d^{4}\Vert \nabla {\mathsf {A}}\Vert _{L^{\infty }(B_R)}^{2}\Vert \nabla v\Vert _{L^{2}(B_R)}^{2}. \end{aligned}$$

Using (5.4) on the second term on the right-hand side, we obtain the result. \(\square \)

We now use Lemma 6.4 to prove Theorem 6.1.

Proof

(Proof of Theorem 6.1) Let \(0<R_0<R_1<R\) be such that \(\overline{\varOmega _-} \subset B_{R_0}\), and let \(\chi \in C^{\infty }({\mathbb {R}}^{d})\) be such that \(0\le \chi \le 1\) and

$$\begin{aligned} \chi =0 \,\,\text { in }B_{R_0} \quad \text { and }\quad \chi = 1\,\, \text { in }{\mathbb {R}}^d\setminus \overline{B_{R_1}}. \end{aligned}$$

We decompose v as

$$\begin{aligned} v=\chi v+(1-\chi ) v=:v_{1}+v_{2}. \end{aligned}$$
(6.6)

Then \(v_{1} \in H^1(B_R)\) and satisfies

$$\begin{aligned} \nabla \cdot ({\mathsf {A}}\nabla v_{1})= -\chi f +\nabla \chi \cdot ({\mathsf {A}}\nabla v) + \nabla v \cdot ({\mathsf {A}}\nabla \chi ) + v\nabla \cdot ({\mathsf {A}}\nabla \chi ) \quad \text {in }B_{R}, \end{aligned}$$

and \(\partial _{{\mathbf {n}}}v_{1}=\text {DtN}_{k}(\gamma v_{1})\) on \({\varGamma _R}\). Lemma 6.4 implies that \(v_1 \in H^2(B_R)\) and that there exists \(C_4= C_4({\mathsf {A}}, d, \chi )>0\) such that

$$\begin{aligned} |v_1|_{H^2(\varOmega _R)}\le C_4\left( \left\| f\right\| _{L^2(\varOmega _R)} + R^{-1}\left\| \nabla v\right\| _{L^2(\varOmega _R)} + R^{-2}\left\| v\right\| _{L^2(\varOmega _R)}\right) , \end{aligned}$$
(6.7)

where (i) we have used the fact that \(\nabla \chi =0\) in a neighbourhood of \(\varOmega _-\) to write all the norms as norms over \(\varOmega _R\), and (ii) we have inserted the inverse powers of R on the right-hand side to keep \(C_4\) a dimensionless quantity. On the other hand, \(v_{2}\) satisfies

$$\begin{aligned} \nabla \cdot ({\mathsf {A}}\nabla v_{2})=-(1-\chi )f -\nabla \chi \cdot ({\mathsf {A}}\nabla v) - \nabla v \cdot ({\mathsf {A}}\nabla \chi ) - v\nabla \cdot ({\mathsf {A}}\nabla \chi ) \quad \text {in }B_{R}, \end{aligned}$$

\(v_2=0\) in \(B_R\setminus B_{R_1}\), and either \(\gamma v_2=0\) or \(\partial _{\mathbf {n}}v_2=0\) on \(\varGamma \).

Since \({\mathsf {A}}\) is Lipschitz, \(A_{\min }>0\), and both \(\varGamma \) and \({\varGamma _R}\) are \(C^{1,1}\), [62, Theorems 2.3.3.2, 2.4.2.5, and 2.4.2.7] imply that, if \(w\in H^1(\varOmega _R)\), \(\nabla \cdot ({\mathsf {A}}\nabla w)\in L^2(\varOmega _R)\), and either \(\gamma w= 0\) or \(\partial _{\mathbf {n}}w=0\) on \(\partial \varOmega _R\), then \(w\in H^2(\varOmega _-)\) and there exists \(C_5=C_5({\mathsf {A}}, \varOmega _-, d, R)>0\) such that

$$\begin{aligned} \vert w\vert _{H^{2}(\varOmega _R)}&\le C_5\left( \left\| \nabla \cdot ({\mathsf {A}}\nabla w)-w\right\| _{L^{2}(\varOmega _R)}+R^{-1}\left\| \nabla w\right\| _{L^2(\varOmega _R)}\right. \\&\quad \left. + R^{-2}\left\| w\right\| _{L^2(\varOmega _R)} \right) . \end{aligned}$$

Applying this with \(w=v_2\), we obtain that

$$\begin{aligned} |v_2|_{H^2(\varOmega _R)}\le C_6\left( \left\| f\right\| _{L^2(\varOmega _R)} + R^{-1}\left\| \nabla v\right\| _{L^2(\varOmega _R)} +R^{-2}\left\| v\right\| _{L^2(\varOmega _R)}\right) , \end{aligned}$$
(6.8)

and the bound (3.9) follows from combining (6.7) and (6.8) using (6.6). \(\square \)

7 The elliptic projection and associated results

Define the sesquilinear form \(a_\star (\cdot ,\cdot )\) by

$$\begin{aligned} a_\star (u,v) := \int _{\varOmega _R} {\mathsf {A}}\nabla u \cdot \overline{\nabla v} - \big \langle \mathrm{DtN}_k\gamma u, \gamma v \big \rangle _{{\varGamma _R}}. \end{aligned}$$
(7.1)

Recall from (2.5) and (2.6) that \({{\mathcal {H}}}\) equals either \(H_{0,D}^1(\varOmega _R)\) (with Dirichlet conditions in (2.3)) or \(H^1(\varOmega _R)\) (with Neumann conditions).

Lemma 7.1

(Continuity and coercivity of \(a_\star (\cdot ,\cdot )\)) For all \(u, v \in {{\mathcal {H}}}\),

$$\begin{aligned} \big |a_\star (u,v)\big |\le {C_{\mathrm{cont}}}_\star \left\| u\right\| _{H^1_k(\varOmega _R)} \left\| v\right\| _{H^1_k(\varOmega _R)} \,\,\text { and }\,\,\mathfrak {R}a_\star (v,v) \ge {C_{\mathrm{coer}}}_\star \left\| v\right\| ^2_{H^1_R(\varOmega _R)}, \end{aligned}$$
(7.2)

where

$$\begin{aligned} {C_{\mathrm{cont}}}_\star := A_{\max } + {C_{\mathrm{DtN}}}_1, \quad {C_{\mathrm{coer}}}_\star := \min \big \{ {C_{\mathrm{DtN}}}_2 ({C_{\mathrm{PF}}})^{-1} \, ,\, A_{\min }(1+{C_{\mathrm{PF}}})^{-1}\big \}, \end{aligned}$$

and

$$\begin{aligned} \left\| v\right\| ^2_{H^1_R(\varOmega _R)}:= \left\| \nabla u\right\| ^2_{L^2(\varOmega _R)} + \frac{1}{R^2} \left\| v\right\| ^2_{L^2(\varOmega _R)}. \end{aligned}$$
(7.3)

Proof

The first inequality in (7.2) follows from the inequality (3.3) and the Cauchy–Schwarz inequality. The second inequality in (7.2) follows from (3.4) and (3.7). \(\square \)

As a consequence of Lemma 7.1, we have

$$\begin{aligned} {C_{\mathrm{coer}}}_\star \left\| v\right\| ^2_{H^1_R(\varOmega _R)} \le \big |a_\star (v,v)\big | \le {C_{\mathrm{cont}}}_\star \left\| v\right\| ^2_{H^1_k(\varOmega _R)} \quad \text { for all }v\in {{\mathcal {H}}}, \end{aligned}$$
(7.4)

and we then define the new norm on \({{\mathcal {H}}}\),

$$\begin{aligned} \left\| v\right\| _\star := \sqrt{a_\star (v,v)}. \end{aligned}$$

Lemma 7.2

(Bounds on the solution of the variational problem associated with \(a_\star (\cdot ,\cdot )\)) The solution of the variational problem

$$\begin{aligned} \text { find }u \in {{\mathcal {H}}}\text { such that }a_\star (u,v) = (f,v)_{L^2(\varOmega _R)} \quad \text { for all }v\in {{\mathcal {H}}}\end{aligned}$$

satisfies

$$\begin{aligned} \left\| u\right\| _{H^1_R(\varOmega _R)} \le \frac{R}{{C_{\mathrm{coer}}}_\star } \left\| f\right\| _{L^2(\varOmega _R)} \quad \text { and }\quad \vert u\vert _{H^2(\varOmega _R)} \le {C_{H^2}}_\star \left\| f\right\| _{L^2(\varOmega _R)}, \end{aligned}$$
(7.5)

where

$$\begin{aligned} {C_{H^2}}_\star :={C_{H^2}}\left( 1 + \sqrt{2}({C_{\mathrm{coer}}}_\star )^{-1}\right) . \end{aligned}$$

Proof

Since \(a_\star (\cdot ,\cdot )\) is continuous and coercive in \({{\mathcal {H}}}\), the first bound in (7.5) follows from the Lax–Milgram theorem and the fact that

$$\begin{aligned} \sup _{v\in {{\mathcal {H}}}}\frac{\big | (f,v)_{L^2(\varOmega _R)}\big |}{\left\| v\right\| _{H^1_R(\varOmega _R)}} \le R \left\| f\right\| _{L^2(\varOmega _R)}, \end{aligned}$$

by the definition of \(\Vert \cdot \Vert _{H^1_R(\varOmega _R)}\) (7.3). The second bound in (7.5) follows from combining the first bound in (7.5) and the bound (3.9). \(\square \)

We now define the particular Galerkin projection known in the literature as the “elliptic projection" (see the discussion in §4.2).

Definition 7.3

(Elliptic projection \({{\mathcal {P}}}_h\)) Given \(u\in {{\mathcal {H}}}\), define \({{\mathcal {P}}}_h u \in {{\mathcal {H}}}_h\) by

$$\begin{aligned} a_\star (v_h, {{\mathcal {P}}}_h u) = a_\star (v_h,u)\quad \text { for all }v_h\in {{\mathcal {H}}}_h. \end{aligned}$$

Since \(a_\star (\cdot ,\cdot )\) is continuous and coercive in \({{\mathcal {H}}}\) by Lemma 7.1, the Lax–Milgram theorem implies that \({{\mathcal {P}}}_h\) is well defined. The definition of \({{\mathcal {P}}}_h\) then immediately implies the Galerkin-orthogonality property that

$$\begin{aligned} a_\star (v_h ,u-{{\mathcal {P}}}_h u) = 0 \quad \text { for all }v_h\in {{\mathcal {H}}}_h. \end{aligned}$$
(7.6)

Lemma 7.4

(Approximation properties of \({{\mathcal {P}}}_h\)) The elliptic projection \({{\mathcal {P}}}_h\) satisfies

$$\begin{aligned} \left\| u-{{\mathcal {P}}}_h u\right\| _\star&\le \sqrt{{C_{\mathrm{cont}}}_\star } \min _{v_h\in {{\mathcal {H}}}_h} \left\| u-v_h\right\| _{H^1_k(\varOmega _R)} \qquad \text { and }\end{aligned}$$
(7.7)
$$\begin{aligned} \left\| u-{{\mathcal {P}}}_h u\right\| _{L^2(\varOmega _R)}&\le h \sqrt{2}{C_{\mathrm{int}}}{C_{H^2}}_\star \sqrt{{C_{\mathrm{cont}}}_\star } \left\| u-{{\mathcal {P}}}_h u\right\| _\star \end{aligned}$$
(7.8)

for all \(u\in {{\mathcal {H}}}\).

Proof

By the Cauchy–Schwarz inequality \(a_\star (\cdot ,\cdot )\) is continuous in the \(\Vert \cdot \Vert _\star \) norm, and by definition, \(a_\star (\cdot ,\cdot )\) is coercive in this norm. Therefore Céa’s lemma implies that

$$\begin{aligned} \left\| u-{{\mathcal {P}}}_h u\right\| _\star \le \min _{v_h\in {{\mathcal {H}}}_h} \left\| u-v_h\right\| _{\star }, \end{aligned}$$

and (7.7) follows from the norm equivalence (7.4).

To prove (7.8) we use the standard duality argument. Given \(u\in {{\mathcal {H}}}\), let \(\xi \) be the solution of the variational problem

$$\begin{aligned} \text { find }\xi \in {{\mathcal {H}}}\text { such that }a_\star (\xi ,v)= (u-{{\mathcal {P}}}_h u,v)_{L^2(\varOmega _R)} \quad \text { for all }v\in {{\mathcal {H}}}. \end{aligned}$$
(7.9)

Then, by Galerkin orthogonality (7.6) and continuity of \(a_\star (\cdot ,\cdot )\), for all \(v_h\in {{\mathcal {H}}}_h\),

$$\begin{aligned} \left\| u-{{\mathcal {P}}}_h u\right\| ^2_{L^2(\varOmega _R)} = a_\star ( \xi ,u-{{\mathcal {P}}}_h u )&= a_\star ( \xi -v_h ,u-{{\mathcal {P}}}_h u )\nonumber \\&\le \left\| \xi -v_h\right\| _\star \left\| u-{{\mathcal {P}}}_h u\right\| _\star . \end{aligned}$$
(7.10)

By the norm equivalence (7.4), the consequence (3.11) of the definition of \({C_{\mathrm{int}}}\), the definition of \(\xi \) (7.9), and the second bound in (7.5),

$$\begin{aligned} \left\| \xi -I_h \xi \right\| _\star \!\le \!\sqrt{{C_{\mathrm{cont}}}_\star } \left\| \xi \!-\!I_h \xi \right\| _{H^1_k(\varOmega _R)}&\le \sqrt{{C_{\mathrm{cont}}}_\star }\sqrt{2}{C_{\mathrm{int}}}h |\xi |_{H^2(\varOmega _R)} ,\\&\le \sqrt{{C_{\mathrm{cont}}}_\star } \sqrt{2} {C_{\mathrm{int}}}h {C_{H^2}}_\star \left\| u\!-\!{{\mathcal {P}}}_h u\right\| _{L^2(\varOmega _R)}, \end{aligned}$$

and the result (7.8) follows from combining this last inequality with (7.10). \(\square \)

8 Adjoint approximability

Definition 8.1

(Adjoint solution operator \({{\mathcal {S}}}^*\)) Given \(f\in L^2(\varOmega _R)\), let \({{\mathcal {S}}}^*f\) be defined as the solution of the variational problem

$$\begin{aligned} \text { find }{{\mathcal {S}}}^*f \in {{\mathcal {H}}}\quad \text { such that }\quad a(v, {{\mathcal {S}}}^*f) = (v,f)_{L^2(\varOmega _R)} \quad \text { for all }v\in {{\mathcal {H}}}. \end{aligned}$$
(8.1)

\({{\mathcal {S}}}^*\) is therefore the solution operator of the adjoint problem to the variational problem (2.7) with data in \(L^2(\varOmega _R)\).

Green’s second identity applied to outgoing solutions of the Helmholtz equation implies that \(\big \langle \mathrm{DtN}_k\psi , {\overline{\phi }}\big \rangle _{{\varGamma _R}} =\big \langle \mathrm{DtN}_k\phi , {\overline{\psi }}\big \rangle _{{\varGamma _R}} \) (see, e.g., [92, Lemma 6.13]); thus \(a({\overline{v}},u) = a({\overline{u}},v)\) and so the definition (8.1) implies that

$$\begin{aligned} a(\overline{{{\mathcal {S}}}^*f}, v)= ({\overline{f}},v)_{L^2(\varOmega _R)}\quad \text { for all }v\in {{\mathcal {H}}}; \end{aligned}$$
(8.2)

i.e. \({{\mathcal {S}}}^*f\) is the complex-conjugate of an outgoing Helmholtz solution.

Following [88], we define the quantity \(\eta ({{\mathcal {H}}}_h)\) by

$$\begin{aligned} \eta ({{\mathcal {H}}}_h): = \sup _{f\in L^2(\varOmega _R)} \min _{v_h\in {{\mathcal {H}}}_h} \frac{\left\| {{\mathcal {S}}}^*f- v_h\right\| _{H^1_k(\varOmega _R)}}{\left\| f\right\| _{L^2(\varOmega _R)}}; \end{aligned}$$
(8.3)

observe that this definition implies that, given \(f\in L^2(\varOmega _R)\),

$$\begin{aligned} \text { there exists } w_h \in {{\mathcal {H}}}_H \text { such that }\left\| {{\mathcal {S}}}^* f - w_h\right\| _{H^1_k(\varOmega _R)} \le \eta ({{\mathcal {H}}}_h) \left\| f\right\| _{L^2(\varOmega _R)}. \end{aligned}$$
(8.4)

Lemma 8.2

Assume that \({\mathsf {A}}, n,\) and \(\varOmega _-\) are nontrapping (and so (3.5) holds with \({C_{\mathrm{sol}}}\) independent of k).

  1. (i)

    If \(\varGamma \in C^{1,1}\) and \({\mathsf {A}}\in C^{0,1}\), then

    $$\begin{aligned} \eta ({{\mathcal {H}}}_h)\le hk \left[ \sqrt{2} {C_{\mathrm{int}}}{C_{H^2}}{C_{\mathrm{sol}}}R\left( n_{\max } + \frac{1}{k_0 R_0 {C_{\mathrm{sol}}}} + 2\right) \right] . \end{aligned}$$
    (8.5)
  2. (ii)

    If \(\varOmega _-\) is a Dirichlet obstacle (so that \({{\mathcal {H}}}= H^1_{0,D}(\varOmega _R)\)), \(\varGamma \) is analytic, \({\mathsf {A}}= {\mathsf {I}}\), \(n=1\), and the triangulation \({{\mathcal {T}}}_h\) in the definition of \({{\mathcal {H}}}_h\) (2.10) satisfies the quasi-uniformity assumption [81, Assumption 5.1], then there exists \(C_{\mathrm{MS}}= C_{\mathrm{MS}}(\varOmega _-)\) such that

    $$\begin{aligned} \eta ({{\mathcal {H}}}_h) \le C_{\mathrm{MS}}\left[ \frac{h}{p} + {C_{\mathrm{sol}}}R\left( \frac{hk}{p}\right) ^p\right] . \end{aligned}$$
    (8.6)

Proof

Part (ii) is proved in [81, Lemma 3.4 and Proposition 5.3]: see [81, Proof of Theorem 5.8], and observe that the nontrapping assumption implies that \(\alpha \) in [81] equals zero. We now prove Part (i).

By the consequence (3.11) of the definition of \({C_{\mathrm{int}}}\) (3.10), there exists \(v_h \in {{\mathcal {H}}}_h\) such that

$$\begin{aligned} \left\| {{\mathcal {S}}}^*f -v_h\right\| _{H^1_k(\varOmega _R)} \le \sqrt{2}{C_{\mathrm{int}}}h |{{\mathcal {S}}}^*f|_{H^2(\varOmega _R)} \end{aligned}$$

(indeed, we can take \(v_h = I_h({{\mathcal {S}}}^* f)\)). By (8.2), the BVP (3.8) is satisfied with \(v:= {{\mathcal {S}}}^*f\) and \({\widetilde{f}}:=f + k^2 n {{\mathcal {S}}}^*f\). Applying the bounds (3.9) and (3.5), we obtain

$$\begin{aligned} |{{\mathcal {S}}}^*f |_{H^2(\varOmega _R)}&\le {C_{H^2}}\left( k^2 n_{\max } \left\| {{\mathcal {S}}}^*f\right\| _{L^2(\varOmega _R)} + \left\| f\right\| _{L^2(\varOmega _R)}\right. \\&\quad \left. + \frac{1}{R}\left\| \nabla ({{\mathcal {S}}}^*f)\right\| _{L^2(\varOmega _R)} + \frac{1}{R^2} \left\| {{\mathcal {S}}}^* f\right\| _{L^2(\varOmega _R)}\right) \\&\le {C_{H^2}}{C_{\mathrm{sol}}}kR \left( n_{\max } + \frac{1}{k R\,{C_{\mathrm{sol}}}} + \frac{1}{kR} + \frac{1}{(kR)^2} \right) \left\| f\right\| _{L^2(\varOmega _R)}, \end{aligned}$$

and the result (8.5) follows from the assumption that \(kR\ge k_0 R_0\ge 1\) (see (3.1)). \(\square \)

9 Proof of the oscillatory-behaviour bound (3.6)

Theorem 9.1

If \({\mathsf {A}}, n, \) and \(\varOmega _-\) are nontrapping (in the sense that the bound (3.5) holds), then the bound (3.6) holds, i.e.,

$$\begin{aligned} \vert u\vert _{H^2(\varOmega _R)} \le C_{\mathrm{{osc}}}k\left\| u\right\| _{H^1_k(\varOmega _R)}. \end{aligned}$$
(9.1)

Lemma 9.2

To prove Theorem 9.1, it is sufficient to prove that there exists \(k_0>0\) and \({C_{\mathrm{mass}}}={C_{\mathrm{mass}}}({\mathsf {A}}, n, \varOmega _-, R)>0\) such that

$$\begin{aligned} \left\| u\right\| _{L^2(\varOmega _{R+1})} \le {C_{\mathrm{mass}}}\left\| u\right\| _{L^2(\varOmega _{R})} \quad \text { for all }k\ge k_0. \end{aligned}$$
(9.2)

Proof

We first claim that the map \(k\mapsto u\) is continuous from \((1,\infty )\) to \(H^2(\varOmega _R)\); indeed, this follows from the well-posedness of the plane-wave scattering problem of Definition 2.2, \(H^2\) regularity, and linearity. Therefore, the function \(k \mapsto \left\| u\right\| _{H^2(\varOmega _R)} \big (k \left\| u\right\| _{H^1_k(\varOmega _R)}\big )^{-1}\) is continuous on \([1,\infty )\), and it is sufficient to prove that the bound (9.1) (i.e., (3.6)) holds for k sufficiently large.

Let \(\chi \in C^{\infty }({\mathbb {R}}^{d})\) be such that \(0\le \chi \le 1\), \(\chi =1\) on \(\varOmega _R\) and \(\chi =0\) on \({\mathbb {R}}^d\setminus B_{R+1/2}\). Applying the \(H^2\)-regularity results [62, Theorems 2.3.3.2, 2.4.2.5, and 2.4.2.7] to \(\chi u\) (with these results valid since \({\mathsf {A}}\) is Lipschitz, \(A_{\min }>0\), both \(\varGamma \) and \({\varGamma _R}\) are \(C^{1,1}\), and either \(\gamma u=0\) or \(\partial _{{\mathbf {n}}}u=0\) on \(\varGamma \)), we obtain, in a similar way to the proof of Theorem 6.1, that there exists \(C_1= C_1({\mathsf {A}}, n, \varOmega _-, R)>0\), such that

$$\begin{aligned} |u|_{H^2(\varOmega _R)}\le C_1 k \left\| u\right\| _{H_k^1(\varOmega _{R+1})}. \end{aligned}$$

Therefore to prove (9.1) (i.e., (3.6)), it is sufficient to prove that there exists \(C_2= C_2({\mathsf {A}}, n, \varOmega _-, R)>0\), such that

$$\begin{aligned} \left\| u\right\| _{H_k^1(\varOmega _{R+1})}\le C_2 \left\| u\right\| _{H_k^1(\varOmega _{R})}. \end{aligned}$$
(9.3)

We now need to show that we can prove (9.3) from (9.2). We claim that

$$\begin{aligned} \left\| \nabla u\right\| _{L^2(\varOmega _{R+1})} \le \sqrt{\frac{n_{\max }}{A_{\min }}} k \left\| u\right\| _{L^2(\varOmega _{R+1})} \quad \text { for all }k>0. \end{aligned}$$
(9.4)

Indeed, applying Green’s identity in \(\varOmega _R\) (which is justified by [78, Theorem 4.4] since \(u \in H^1(\varOmega _R)\)) and recalling that either \(\gamma u=0\) or \(\partial _{\mathbf {n}}u=0\) on \(\varGamma \), we have that

$$\begin{aligned} \int _{\varOmega _{R+1}} ({\mathsf {A}}\nabla u )\cdot \overline{\nabla u} - k^2 n |u|^2 = \mathfrak {R}\int _{\varGamma _{R+1}} {\overline{u}} \frac{\partial u}{\partial r}. \end{aligned}$$

By (3.4), the right-hand side is \(\le 0\), and (9.4) follows using the inequalities (2.1) and (2.2). Therefore, using (9.4) and (9.2),

$$\begin{aligned} \left\| u\right\| _{H_k^1(\varOmega _{R+1})}\le \sqrt{\frac{n_{\max }}{A_{\min }}+1}\,\, k\left\| u\right\| _{L^2(\varOmega _{R+1})} \le {C_{\mathrm{mass}}}\sqrt{\frac{n_{\max }}{A_{\min }}+1}\,\, k\left\| u\right\| _{L^2(\varOmega _{R})} \end{aligned}$$

which implies the bound (9.3), and the result follows. \(\square \)

9.1 Overview of the ideas used in the rest of this section to prove (9.2)

We have therefore reduced proving the oscillatory-behaviour bound (3.6)/(9.1) to proving the bound (9.2), which we prove using defect measures. The precise definition of a defect measure is given in Theorem 9.3 below, but the idea is that the defect measure of a Helmholtz solution describes where the mass of the solution in phase space (i.e. the set of positions \({\mathbf {x}}\) and momenta \(\varvec{\xi }\)) is concentrated in the high-frequency limit. Two examples of this feature are

  1. (i)

    the defect measure of the plane wave \(u^I({\mathbf {x}}):= \exp (\mathrm{i}k {\mathbf {x}}\cdot {\mathbf {a}})\) is the product of a delta function at \(\varvec{\xi }={\mathbf {a}}\) and Lebesgue measure in \({\mathbf {x}}\) (see (9.8) below), reflecting the fact that, at high frequency (and in fact at any frequency), all the mass in phase space of the plane wave is travelling in the direction \({\mathbf {a}}\), and

  2. (ii)

    the defect measure of an outgoing solution of the Helmholtz equation is zero on the so-called “directly incoming set" (see Lemma 9.8 below), where this set is defined in (9.20) below as points in phase space that don’t hit the scatterer when propagated backwards along the flow.

A key feature of the defect measure of a Helmholtz solution is that it is invariant under the Hamiltonian flow defined by the symbol of the PDE, as long as the flow doesn’t encounter the scatterer (see Theorem 9.6 below) This is analogous to results about propagation of singularities of the wave equation, where singularities travel along the trajectories of the flow (the bicharacteristics), and the projection of these trajectories in space are the rays.

The main ingredients to our proof of (9.2) are Points (i) and (ii) above, invariance under the flow (away from the scatterer), and then geometric arguments about the rays, using the fact that away from the scatterer the rays are straight lines and the flow has constant speed along the rays (see (9.12) below).

To conclude this overview, we direct the reader to [105, Chapter 5] for extensive discussion of defect measures in \({\mathbb {R}}^d\), to [16, 50, 84] for material on defect measures on manifolds with boundary, and to [15] for discussion on the history of defect measures.

9.2 Recap of results about defect measures

9.2.1 Symbols and quantisation

Before defining defect measures, we need to define the functions on phase space (i.e. the set of positions \({\mathbf {x}}\) and momenta \(\varvec{\xi }\)) that the defect measure can act upon by dual pairing. These functions are called symbols, defined as functions on the cotangent bundle \(T^*\varOmega _+\). Recall the definition of the cotangent bundle of \({\mathbb {R}}^d\):

$$\begin{aligned} T^*{\mathbb {R}}^d :={\mathbb {R}}^d \times ({\mathbb {R}}^d)^*; \end{aligned}$$

for our purposes, we can consider \(T^*{\mathbb {R}}^d\) as \(\{({\mathbf {x}},\varvec{\xi }) : {\mathbf {x}}\in {\mathbb {R}}^d, \varvec{\xi }\in {\mathbb {R}}^d\}\), i.e. the set of positions \({\mathbf {x}}\) and momenta \(\varvec{\xi }\). On \(T^*{\mathbb {R}}^d\), the quantisation of a symbol \(b({\mathbf {x}},\varvec{\xi }) \in C_{\mathrm{comp}}^\infty (T^*{\mathbb {R}}^d)\) is defined by

$$\begin{aligned} b\big ({\mathbf {x}}, (\mathrm{i}k)^{-1}\partial _{\mathbf {x}}\big )u({\mathbf {x}}):= \frac{k^d}{(2\pi )^d} \int _{{\mathbb {R}}^d}\int _{{\mathbb {R}}^d} \mathrm{e}^{\mathrm{i}k ({\mathbf {x}}-{\mathbf {y}})\cdot \varvec{\xi }} \,b({\mathbf {x}},\varvec{\xi }) u({\mathbf {y}}) \, \mathrm{d}{\mathbf {y}}\,\mathrm{d}\varvec{\xi }; \end{aligned}$$
(9.5)

see, e.g., [105, §4]. The same definition holds for symbols supported away from the boundary of \({\overline{\varOmega }}_+\). We omit the analogous definition near the boundary since it is more involved; see [16, §4.2] (where it involves the so-called compressed cotangent bundle of \(\varOmega _+\), \(T^*_{\mathrm{b}}\overline{\varOmega _+}\)) and [84, §1.2]. We will not, in any event, require any specifics of the measure at the boundary in proving Theorem 9.1.

9.2.2 Existence of defect measures

Theorem 9.3

(Existence of defect measures [105, Theorem 5.2], [16, §4.2]) Suppose \(\{v(k)\}_{k_0\le k<\infty }\) is a collection of functions that is uniformly locally bounded in \(L^2(\varOmega _+)\), i.e. given \(\chi \in C_{\mathrm{comp}}^\infty ({\mathbb {R}}^d)\) there exists \(C>0\), depending on \(\chi \) and \(k_0\) but independent of k, such that

$$\begin{aligned} \left\| \chi v(k)\right\| _{L^2(\varOmega _+)}\le C \quad \text { for all }k \ge k_0. \end{aligned}$$
(9.6)

Then there exists a sequence \(k_{\ell }\rightarrow \infty \) and a non-negative Radon measure \(\mu \) on \(T^*_{\mathrm{b}}\overline{\varOmega _+}\) (depending on \(k_{\ell }\)) such that, for any symbol \(b({\mathbf {x}},\varvec{\xi })\in C_{\mathrm{comp}}^\infty (T^*_{\mathrm{b}}\overline{\varOmega _+})\)

$$\begin{aligned} \big \langle b\big ({\mathbf {x}},(\mathrm{i}k_\ell )^{-1}\partial _{{\mathbf {x}}}\big )v(k_\ell ),v(k_{\ell }))\big \rangle _{\varOmega _+}\longrightarrow \int b\ \mathrm{d}\mu \quad \text { as }\ell \rightarrow \infty . \end{aligned}$$
(9.7)

In the case of a plane wave \(u^I({\mathbf {x}}):= \exp (\mathrm{i}k{\mathbf {x}}\cdot {\mathbf {a}})\) with \(|{\mathbf {a}}|=1\), a direct calculation using (9.5) and the definition of the Fourier transform shows that, for all k,

$$\begin{aligned} \big \langle b\, u^I, u^I\big \rangle _{{\mathbb {R}}^d}&:= \frac{k^d}{(2\pi )^d} \int _{{\mathbb {R}}^d}\int _{{\mathbb {R}}^d}\int _{{\mathbb {R}}^d} \,\mathrm{e}^{\mathrm{i}k({\mathbf {x}}-{\mathbf {y}})\cdot \varvec{\xi }}\, \mathrm{e}^{\mathrm{i}k {\mathbf {y}}\cdot {\mathbf {a}}} \,\mathrm{e}^{-\mathrm{i}k {\mathbf {x}}\cdot {\mathbf {a}}}b({\mathbf {x}},\varvec{\xi })\,\mathrm{d}\varvec{\xi }\, \mathrm{d}{\mathbf {y}}\,\mathrm{d}{\mathbf {x}}\nonumber \\&=\int _{{\mathbb {R}}^d} b({\mathbf {x}},{\mathbf {a}})\,\mathrm{d}{\mathbf {x}}; \end{aligned}$$
(9.8)

i.e. for any sequence \(k_{\ell }\rightarrow \infty \), the corresponding defect measure of \(u^I\) is the product of the Lebesgue measure in \({\mathbf {x}}\) by a delta measure at \(\varvec{\xi }={\mathbf {a}}\); we therefore talk about the (as opposed to a) defect measure of \(u^I\).

The next lemma proves that, if u is the solution of the plane-wave scattering problem and \(\chi \) is an arbitrary cut-off function, then \(\chi u\) is uniformly bounded in k (on compact subsets of \(\varOmega _+\)); existence of a defect measure of u then follows from Theorem 9.3. In the rest of this section, to emphasise the k-dependence of u, we write \(u=u(k)\).

Lemma 9.4

Let u(k) be the solution of the plane-wave scattering problem of Definition 2.2. Assume that \({\mathsf {A}}, n, \) and \(\varOmega _-\) are nontrapping. Then there exists \(C({\mathsf {A}},n,\varOmega _-, R, k_0)>0\) such that

$$\begin{aligned} \left\| u(k)\right\| _{L^2(\varOmega _R)} \le C \quad \text { for all }k \ge k_0. \end{aligned}$$
(9.9)

Proof

Let \(\chi \in C^\infty _{\mathrm{comp}}({\mathbb {R}}^d)\) be such that \(\chi =1\) in a neighbourhood of the scatterer \(\varOmega _{\mathrm{sc}}\). Let \(v:= u^S + \chi u^I\), so that \(u= (1-\chi )u^I + v\). Since \(\Vert u^I(k)\Vert _{L^2(\varOmega _R)}\le C_1(R)\) for all \(k>0\), the result (9.9) will follow if we prove a uniform bound on \(\Vert v(k)\Vert _{L^2(\varOmega _R)}\). The definition of v implies that v satisfies the Sommerfeld radiation condition, either \(\gamma v=0\) or \(\partial _{\mathbf {n}}v=0\) on \(\varGamma \), and, with \({{\mathcal {L}}}_{{\mathsf {A}},n}w:= \nabla \cdot ({\mathsf {A}}\nabla w ) + k^2 n w\) and \([A,B]:=AB-BA\),

$$\begin{aligned} {{\mathcal {L}}}_{{\mathsf {A}},n}v = - {{\mathcal {L}}}_{{\mathsf {A}},n}\big ((1-\chi )u^I\big ) = \big [ {{\mathcal {L}}}_{{\mathsf {A}},n}, \chi \big ]u^I - (1-\chi ) {{\mathcal {L}}}_{{\mathsf {A}},n} u^I = \big [ {{\mathcal {L}}}_{{\mathsf {A}},n}, \chi \big ]u^I, \end{aligned}$$

since \({{\mathcal {L}}}_{{\mathsf {A}},n}u^I=0\) when \(1-\chi \ne 0\). By explicit calculation, using the fact that \(u^I({\mathbf {x}})= \exp (\mathrm{i}k {\mathbf {x}}\cdot {\mathbf {a}})\),

$$\begin{aligned} \left\| \big [ {{\mathcal {L}}}_{{\mathsf {A}},n}, \chi \big ]u^I \right\| _{L^2(\varOmega _R)} \le C_1 \, k, \end{aligned}$$

where \(C_1\) depends on \(\Vert {\mathsf {A}}\Vert _{L^\infty (\varOmega _R)}, \Vert \nabla {\mathsf {A}}\Vert _{L^\infty (\varOmega _R)},\) and \(\chi \), but is independent of k. The nontrapping bound (3.5) then implies that \(\Vert v(k)\Vert _{L^2(\varOmega _R)}\le C_2\) with \(C_2\) independent of k, and the result follows. \(\square \)

9.2.3 Support and invariance properties of defect measures

Recall that the semi-classical principal symbol of the Helmholtz equation (2.3) is given by

$$\begin{aligned} p({\mathbf {x}},\varvec{\xi }):= \sum _{i=1}^d\sum _{j=1}^{d} A_{ij}({\mathbf {x}})\xi _i \xi _j - n({\mathbf {x}}) \end{aligned}$$
(9.10)

(see, e.g., [105, Page 281]). In our arguments below we only consider points \(({\mathbf {x}},\varvec{\xi })\) in phase space when \(p=0\); this is because of the following result.

Theorem 9.5

(Support of defect measure [105, Theorem 5.4], [16, Equation 3.17]) Suppose u(k) satisfies (9.9), and let \(\mu \) be any defect measure of u(k). Then \(\mathrm{supp}\mu \subset \{ ({\mathbf {x}},\varvec{\xi }) : p({\mathbf {x}},\varvec{\xi })= 0\}.\)

As an illustration of this, the plane wave \(u^I({\mathbf {x}}):= \exp (\mathrm{i}k{\mathbf {x}}\cdot {\mathbf {a}})\) with \(|{\mathbf {a}}|=1\) is solution of the Helmholtz equation (2.3) with \({\mathsf {A}}={\mathsf {I}}\) and \(n=1\), and hence \(p=|\varvec{\xi }|^2-1\) in this case. By (9.8), the defect measure of \(u^I\) is the product of Lebesgue measure in \({\mathbf {x}}\) and a delta function at \(\varvec{\xi }={\mathbf {a}}\), and thus is supported in \(|\varvec{\xi }|=1\), i.e., \(p=0\), as expected from Theorem 9.5.

The final result about defect measures that we need is their invariance under the flow (away from the scatterer). This result is Theorem 9.6 below; to state it, we first need to define the flow.

Away from \(\varGamma \), and provided that A and n are both \(C^{1,1}\), the flow \(\varphi _t\) is defined as follows: given \(\rho = ({\mathbf {x}}_0,\xi _0)\), \(\varphi _t(\rho ):= ({\mathbf {x}}(t),\varvec{\xi }(t))\) where \(({\mathbf {x}}(t),\varvec{\xi }(t))\) is the solution of the Hamiltonian system

$$\begin{aligned} \dot{x_i}(t) = \partial _{\xi _i}p\big ({\mathbf {x}}(t), \varvec{\xi }(t) \big ), \qquad \dot{\xi _i}(t) = -\partial _{x_i}p\big ({\mathbf {x}}(t), \varvec{\xi }(t) \big ), \end{aligned}$$
(9.11)

with initial condition \(({\mathbf {x}}(0), \varvec{\xi }(0))= ({\mathbf {x}}_0, \varvec{\xi }_0)\), where the Hamiltonian equals p defined by (9.10). Near both \(\varGamma \) and places where A and n are not \(C^{1,1}\), the definition of \(\varphi _t\) is more involved – this is to account for reflection or refraction. However, we do not need this definition in what follows, since our arguments take place away from these regions. In fact our arguments take place away from the scatterer \(\varOmega _{\mathrm{sc}}\). Outside \(\varOmega _{\mathrm{sc}}\), \({\mathsf {A}}={\mathsf {I}}\), and \(n=1\); thus \(p({\mathbf {x}},\varvec{\xi })= |\varvec{\xi }|^2-1\). From (9.11), the flow satisfies \(\dot{x_i} =2 \xi _i\) and \(\dot{\xi _i}=0\) and is therefore given by the straight-line motion

$$\begin{aligned} {\mathbf {x}}={\mathbf {x}}_0+2t\varvec{\xi }_0, \quad \varvec{\xi }=\varvec{\xi }_0. \end{aligned}$$
(9.12)

The arguments below consider the flow with speed 2 (i.e. with \(|\varvec{\xi }_0|=1\)). This is without loss of generality, since away from \(\varOmega _{\mathrm{sc}}\) Theorem 9.5 implies that \(\mu \) is only non-zero when \(|\varvec{\xi }|=1\).

Both in the next result and later, we let \(\pi _{\mathbf {x}}\) denote projection in the \({\mathbf {x}}\) variables, i.e. \(\pi _{\mathbf {x}}(({\mathbf {x}},\varvec{\xi }))={\mathbf {x}}\).

Theorem 9.6

(Invariance of defect measure under the flow away from the scatterer) Suppose that u(k) satisfies (9.9), and let \(\mu \) be any defect measure of u(k). If \(A\subset T^* {\mathbb {R}}^d\) is such that \(\pi _{\mathbf {x}}(\varphi _s(A)) \cap \varOmega _{\mathrm{sc}}= \emptyset \) for s between 0 and t, (i.e. the flow acting on A doesn’t hit the scatterer from time 0 to time t), then

$$\begin{aligned} \mu ( \varphi _t(A))= \mu (A). \end{aligned}$$
(9.13)

Proof

In the absence of the scatterer, invariance of the measure under the flow is the statement that, for \(b\in C_{\mathrm{comp}}^\infty (T^*{\mathbb {R}}^d)\),

$$\begin{aligned} \partial _s \bigg ( \int (b\circ \varphi _{-s})(\rho )\, \mathrm{d}\mu \bigg ) = 0 \quad \text { for all }s, \end{aligned}$$
(9.14)

and this is proved in [105, Theorem 5.4], [16, Proposition 4.4]. For this result to hold in the presence of the scatterer in a time interval \(0\le s\le t\), we need the spatial projection of the integrand in (9.14) to not be supported during this time interval on \(\varOmega _{\mathrm{sc}}\), i.e., we need the condition that

$$\begin{aligned} \pi _{\mathbf {x}}\big (\mathrm{supp}(b\circ \varphi _{-s})\big ) \cap \varOmega _{\mathrm{sc}}= \emptyset \quad \text { for }0\le s\le t. \end{aligned}$$
(9.15)

Under this condition, (9.14) implies that

$$\begin{aligned} \int b(\rho )\, \mathrm{d}\mu = \int (b\circ \varphi _{-s})(\rho )\, \mathrm{d}\mu \quad \text { for all }0\le s\le t. \end{aligned}$$
(9.16)

Let \(1_A\) denote the indicator function of a set A. By approximating \(1_A\) by smooth symbols, (9.16) holds with \(b(\rho )=1_A(\rho )\), provided that the condition (9.15) holds. Since \(\varphi _{-s}(\rho ) \in A\) iff \(\rho \in \varphi _s(A)\), we have

$$\begin{aligned} \pi _{\mathbf {x}}\big (\mathrm{supp}(1_A\circ \varphi _{-s})\big )= \pi _{\mathbf {x}}\big (\mathrm{supp}(1_{\varphi _{s}(A)})\big ) = \pi _{\mathbf {x}}\big (\varphi _s(A)\big ), \end{aligned}$$

and thus (9.15) holds by the assumption in the statement of the theorem.

Therefore, (9.16) implies that, for all \(0\le s\le t\),

$$\begin{aligned} \int 1_A(\rho )\, \mathrm{d}\mu&= \int 1_A(\varphi _{-s}(\rho ))\, \mathrm{d}\mu = \int 1_{\varphi _s(A)}(\rho )\, \mathrm{d}\mu , \end{aligned}$$

i.e.

$$\begin{aligned} \mu (A) = \mu \big (\varphi _s(A)\big ) \quad \text { for all }0\le s\le t, \end{aligned}$$

which implies (9.13). \(\square \)

9.3 Proof of (9.2) using defect measures

The following lemma reduces proving the bound (9.2) to proving a statement about defect measures.

Lemma 9.7

Let \(0<R_0<R\) be such that \(\varOmega _{\mathrm{sc}}\subset \subset B_{R_0}\). If every defect measure of u is non-zero and there exists \(C_{R,R_0}>0\) such that, for every defect measure \(\mu \) of u,

$$\begin{aligned} \mu (T^*\varOmega _{R+2}) \le C_{R,R_0} \mu (T^*\varOmega _{R_0}), \end{aligned}$$
(9.17)

then the bound (9.2) holds.

Proof

We prove the contrapositive. Suppose (9.2) fails; we aim to exhibit a defect measure associated to u for which (9.17) fails. Then, for any \(C_1>0\), there exists a sequence \((k_n)_{n=1}^\infty \), with \(k_n \rightarrow \infty \), such that

$$\begin{aligned} \Vert u(k_n) \Vert _{L^2(\varOmega _{R+1})} \ge C_1 \Vert u(k_n) \Vert _{L^2(\varOmega _{R})}; \end{aligned}$$
(9.18)

we choose \(C_1:= 2C_{R,R_0}\). By Lemma 9.4, the sequence \(\{u(k_n)\}_{n=1}^\infty \) is locally uniformly bounded and Theorem 9.3 implies that, by passing to a subsequence, there exists a defect measure \(\mu \) of u associated to the subsequence, which we again denote \(k_n\). Let \(\chi _0, \chi _1 \in C^\infty ({\mathbb {R}}^d)\) be such that \(0\le \chi _{0}, \chi _1 \le 1\), and

$$\begin{aligned} \mathrm{supp}\chi _1 \subset B_{R+2}, \quad \chi _1 = 1 \text { in } B_{R+1}, \quad \mathrm{supp}\chi _0 \subset B_{R},\quad \chi _0 = 1 \text { in } B_{R_0}. \end{aligned}$$

The bound (9.18) then implies that

$$\begin{aligned} \Vert \chi _1 u(k_n) \Vert _{L^2(\varOmega _+)} \ge 2C_{R,R_0} \Vert \chi _0 u(k_n) \Vert _{L^2(\varOmega _+)}. \end{aligned}$$
(9.19)

Passing to the limit \(n\rightarrow \infty \) and using the property of defect measure (9.7), we obtain that

$$\begin{aligned} \int \chi _1^2 \,\mathrm{d}\mu \ge 2C_{R,R_0} \int \chi _0^2 \,\mathrm{d}\mu . \end{aligned}$$

The definitions of \(\chi _{0}\) and \(\chi _{1}\) imply that

$$\begin{aligned} \int \chi _0^2 \, \mathrm{d}\mu \ge \int 1_{T^*\varOmega _{R_0}}\, \mathrm{d}\mu = \mu ( T^*\varOmega _{R_0}) \end{aligned}$$

and

$$\begin{aligned} \int \chi _1^2 \, \mathrm{d}\mu \le \int 1_{T^*\varOmega _{R+2}}\, \mathrm{d}\mu = \mu ( T^*\varOmega _{R+2}); \end{aligned}$$

hence

$$\begin{aligned} \mu (T^*\varOmega _{R+2}) \ge 2C_{R,R_0} \mu (T^*\varOmega _{R_0}), \end{aligned}$$

contradicting (9.17). \(\square \)

Before using Lemma 9.7 to prove (9.2), we prove a result (Lemma 9.8 below) about the structure of \(\mu \), exploiting the fact that \(u=u^I + u^S\) with \(u^S\) is outgoing (in the sense that it satisfies the Sommerfeld radiation condition (2.4)). To make use of this outgoing property, we need to define appropriate notions of incoming and outgoing for elements of phase space. Let \({\mathcal {I}}\) denote the directly incoming set defined by

$$\begin{aligned} {{\mathcal {I}}}:= \bigg \{\rho \in T^{*}(\varOmega _+{\setminus \varOmega _{\mathrm{sc}}}),\text { s.t. }\pi _{\mathbf {x}}\bigg (\bigcup _{t\ge 0}\varphi _{-t}(\rho )\bigg )\cap \varOmega _{\mathrm{sc}}=\emptyset \bigg \}; \end{aligned}$$
(9.20)

where recall that \(\pi _{\mathbf {x}}\) denotes projection in the \({\mathbf {x}}\) variables. That is, \({{\mathcal {I}}}\) is everything that never hits the scatterer under backward flow. Let

$$\begin{aligned} \varGamma _+ := (T^* \varOmega _+) \backslash {{\mathcal {I}}}. \end{aligned}$$

These definitions of \({{\mathcal {I}}}\) and \(\varGamma _+\) do not require the generalized bicharacteristic flow \(\varphi _t\) to be defined in \(T^* \varOmega _{\mathrm{sc}}\), but when the flow is defined everywhere, \(\varGamma _+\) is the forward generalized bicharacteristic flowout of \(\varOmega _{\mathrm{sc}}\), that is

$$\begin{aligned} \varGamma _+= \bigg \{ \bigcup _{t\ge 0} \varphi _t (\rho ) \,\, : \,\, \rho \in T^* \varOmega _{\mathrm{sc}}\bigg \} \text { when }\varphi _t\text { is defined everywhere.} \end{aligned}$$

The following lemma uses outgoingness of \(u^S\) to show that, given a set E in phase space, the mass of u lying over E is either in the forward flowout \(\varGamma _+\) or associated to the incident wave \(u^I\).

Lemma 9.8

For any Borel set \(E\subset T^* \varOmega \), \(\mu (E \setminus \varGamma _+) = \mu ^I (E \setminus \varGamma _+) \), where \(\mu \) is any defect measure of u, and \(\mu ^I\) is the defect measure of \(u^I\).

Proof

Let \(k_\ell \) be the sequence associated to the particular defect measure of u. By Lemma 9.4, \(u^S(k_\ell )\) is uniformly locally bounded, and so there exists a subsequence \(k_{\ell _m}\) and a defect measure associated to \(u^S\), denoted by \(\mu ^S\). Then, by linearity and (9.7), \(\mu = \mu ^S+ \mu ^I\). It is therefore sufficient to prove that \(\mu ^S(E\setminus \varGamma _+)=0\). But, by the definition of \(\varGamma _+\), \(E\setminus \varGamma _+ \subset {{\mathcal {I}}}\), and \(\mu ^S({{\mathcal {I}}})=0\) by [16, Proposition 3.5], [50, Lemma 3.4], since \(u^S\) is outgoing. \(\square \)

Proof of Theorem 9.1

By Lemmas 9.2 and 9.7 it is sufficient to prove the bound (9.17) (observe that the hypothesis in Lemma 9.7 that every defect measure of u is non-zero holds by Lemma 9.8 since \(\mu ^I({{\mathcal {I}}})\ne 0\)). Let \(R_{\mathrm{sc}}:= \max _{{\mathbf {x}}\in \varOmega _{\mathrm{sc}}}|{\mathbf {x}}|\). We claim that it is sufficient to show that, for any \(\rho > R_{\mathrm{sc}}\) there exists \(\varepsilon =\varepsilon (R_{\mathrm{sc}},\rho )\) , with \(\varepsilon (R_{\mathrm{sc}},\rho )\) is an increasing function of \(\rho \), and \(C=C(\rho ,\varepsilon )>0\) such that

$$\begin{aligned} \mu (T^*(B_{\rho +\varepsilon } \setminus B_{\rho })) \le C(\rho ,\varepsilon ) \mu (T^*\varOmega _\rho ). \end{aligned}$$
(9.21)

Indeed, we now show that the bound (9.17) then follows by using (9.21) repeatedly. Since \(\varepsilon (R_{\mathrm{sc}},\rho )\) is an increasing function of \(\rho \), if \(\varepsilon ^* := \varepsilon (R_{\mathrm{sc}}, R_0)\), then (9.21) implies, with \(C(\rho ):= C(\rho , \varepsilon (R_{\mathrm{sc}},\rho ))\),

$$\begin{aligned} \mu (T^*(B_{\rho +\varepsilon ^*} \setminus B_{\rho })) \le C(\rho )\, \mu (T^*\varOmega _\rho ) \quad \text { for all }\rho \ge R_0. \end{aligned}$$
(9.22)

The bound (9.17) then follows by applying (9.22) with \(\rho =R_0\), \(\rho = R_0+ \varepsilon ^*\), ..., \(\rho = R_0 + m\varepsilon ^*\), where \(m = \lceil (R+2-R_0)/\varepsilon ^* \rceil \).

It is therefore sufficient to prove the bound (9.21); we introduce the notation that \(A:=B_{\rho +\varepsilon } \setminus B_{\rho }\), and observe that (9.21) then reads \(\mu (T^*A) \le C(\rho ,\varepsilon ) \mu (T^*\varOmega _\rho )\). We prove this bound by combining the following three inequalities:

$$\begin{aligned} \mu (T^*A) \le \mu (T^*A \cap \varGamma _+) + \mu _I(T^*A) = \mu (T^*A \cap \varGamma _+)+ |A| \end{aligned}$$
(9.23)

(where \(|\cdot |\) denotes Lebesgue measure in \({\mathbb {R}}^d\)),

$$\begin{aligned} \mu (T^*A \cap \varGamma _+)\le \mu (T^*(B_{\rho } \setminus B_{\rho _0}))\le \mu (T^*\varOmega _\rho ), \end{aligned}$$
(9.24)

where \(\rho _0:= (\rho +R_{\mathrm{sc}})/2\), and

$$\begin{aligned} \mu (T^*\varOmega _\rho ) \ge \delta |\varOmega _\rho | \end{aligned}$$
(9.25)

for some \(\delta >0\). Indeed, using (9.23), (9.24), and (9.25), we have

$$\begin{aligned} \mu (T^*A) \le \Big (1+ {|{A}|}(\delta {|{\varOmega _\rho }|})^{-1}\Big ) \mu (T^*\varOmega _\rho ), \end{aligned}$$

which is (9.21). We prove (9.23) and (9.25) using Lemma 9.8 and the structure of \(\mu ^I\), and (9.24) using invariance of defect measures under the flow outside of \(T^* \varOmega _{\mathrm{sc}}\) (i.e. Theorem 9.6).

Proof of (9.23)

Lemma 9.8 implies that

$$\begin{aligned} \mu (T^*A) = \mu (T^*A\cap \varGamma _+) + \mu (T^*A\setminus \varGamma _+) \le \mu (T^*A \cap \varGamma _+) + \mu _I(T^*A). \end{aligned}$$

By (9.8), \(\mu _I\) is a \(\delta \)-measure on \(\varvec{\xi }={\mathbf {a}}\) times Lebesgue measure in \({\mathbf {x}},\) so \(\mu _I(T^*A) = {|{A}|},\) (where \(|\cdot |\) denotes Lebesgue measure in \({\mathbb {R}}^d\)) and (9.23) follows.

Proof of (9.24)

Recall that, for \(X\subset \subset {\mathbb {R}}^d\setminus {\overline{\varOmega _{\mathrm{sc}}}}\),

$$\begin{aligned} S^*X:= \big \{ ({\mathbf {x}},\varvec{\xi }) : {\mathbf {x}}\in X,\, \varvec{\xi }\in {\mathbb {R}}^d \text { with} \,|\varvec{\xi }|=1\big \}, \end{aligned}$$

and observe that, by Theorem 9.5, \(\mu (T^*A\cap \varGamma _+)= \mu (S^*A\cap \varGamma _+)\) and \(\mu (T^*(B_{\rho } \setminus B_{\rho _0})) = \mu (S^*(B_{\rho } \setminus B_{\rho _0}))\); we therefore only need to prove that

$$\begin{aligned} \mu (S^*A \cap \varGamma _+)\le \mu (S^*(B_{\rho } \setminus B_{\rho _0})). \end{aligned}$$
(9.26)

We first introduce some notation that allows us to bound \(\mu (S^*A \cap \varGamma _+)\) using only the invariance of defect measure (9.13) in the exterior of \(\varOmega _{\mathrm{sc}}\). Given \({\mathbf {b}}\in {\mathbb {R}} ^d\) with \(|{\mathbf {b}}|=1\) and \({\widetilde{\rho }}>R_{\mathrm{sc}}\), let \(\varOmega _{\mathrm{sc},{\widetilde{\rho }}, {\mathbf {b}}}\subset {\mathbb {R}}^d\) and \(\varLambda _{\text {sc},{\widetilde{\rho }},{\mathbf {b}}} \subset S^* \varOmega _+\) be defined by

$$\begin{aligned} \varOmega _{\mathrm{sc},{\widetilde{\rho }}, {\mathbf {b}}}:= \Big ( \bigcup _{t \ge 0} \big (\varOmega _{\mathrm{sc}}+ t {\mathbf {b}}\big ) \Big )\cap \varOmega _{{\widetilde{\rho }}} \quad \text { and }\quad \varLambda _{\text {sc},\rho +\varepsilon ,{\mathbf {b}}} := \varOmega _{\mathrm{sc},{\widetilde{\rho }}, {\mathbf {b}}}\times \{{\mathbf {b}}\}; \end{aligned}$$

i.e. \(\varOmega _{\mathrm{sc},{\widetilde{\rho }}, {\mathbf {b}}}\) equals the union of all possible translations of \(\varOmega _{\mathrm{sc}}\) in the direction \({\mathbf {b}}\), intersected with \(\varOmega _{{\widetilde{\rho }}}\), and \(\varLambda _{\text {sc},{\widetilde{\rho }},{\mathbf {b}}}\) equals these points paired with the direction \({\mathbf {b}}\). By (9.12), the spatial projections of the flow outside \(\varOmega _{\mathrm{sc}}\) are straight lines, and thus

$$\begin{aligned} \varGamma _+ \cap S^* \varOmega _{{\widetilde{\rho }}} \cap \{ \varvec{\xi }= {\mathbf {b}}\} = \Big \{ ({\mathbf {x}}, {\mathbf {b}}) \in S^* \varOmega _{{\widetilde{\rho }}} : \exists s \ge 0 \text { s.t. } {\mathbf {x}}-s{\mathbf {b}}\in \varOmega _{\mathrm{sc}}\Big \}. \end{aligned}$$

Therefore

$$\begin{aligned} \varGamma _+ \cap S^* \varOmega _{{\widetilde{\rho }}} \cap \{ \varvec{\xi }= {\mathbf {b}}\} \subset \varLambda _{\text {sc},{\widetilde{\rho }},{\mathbf {b}}}, \qquad \varGamma _+ \cap S^* \varOmega _{{\widetilde{\rho }}} \subset \bigcup _{{\mathbf {b}}\in {\mathbb {R}} ^ d, |{\mathbf {b}}|=1} \varLambda _{\text {sc},{{\widetilde{\rho }}},{\mathbf {b}}}, \end{aligned}$$
(9.27)

and thus, for any \(\varepsilon >0\),

$$\begin{aligned} S^*A \cap \varGamma _+ = S^*A \cap S^*\varOmega _{\rho +\varepsilon }\cap \varGamma _+\subset S^*A \cap \bigg ( \bigcup _{{\mathbf {b}}\in {\mathbb {R}} ^ d, |{\mathbf {b}}|=1} \varLambda _{\text {sc},\rho +\varepsilon ,{\mathbf {b}}}\bigg ). \end{aligned}$$
(9.28)

Recall that \(\rho _0:= (\rho + R_{\mathrm{sc}})/2\). Let

$$\begin{aligned} t_0:= \frac{\rho _0-R_{\mathrm{sc}}}{4} = \frac{\rho -R_{\mathrm{sc}}}{8} \end{aligned}$$
(9.29)

and

$$\begin{aligned} \varepsilon := -\rho + \sqrt{ R_{\mathrm{sc}}^2+\left( \frac{\rho -R_{\mathrm{sc}}}{4}+ \sqrt{\rho ^2-R_{\mathrm{sc}}^2}\right) ^2}; \end{aligned}$$
(9.30)

observe that \(\varepsilon >0\) and \(\varepsilon \) is an increasing function of \(\rho \), as claimed underneath (9.21). We now claim that, with these definitions of \(t_0\) and \(\varepsilon \),

$$\begin{aligned} \bigcup _{0\le t \le t_0} \varphi _t \big (S^*(B_\rho \setminus B_{\rho _0})\big ) \cap \varOmega _{\mathrm{sc}}= \emptyset \end{aligned}$$
(9.31)

(i.e., the forward flowout of the annulus \(B_\rho \setminus B_{\rho _0}\) does not hit the scatterer for \(0\le t\le t_0\)) and

$$\begin{aligned} S^*A\cap \bigg ( \bigcup _{{\mathbf {b}}\in {\mathbb {R}} ^ d, |{\mathbf {b}}|=1} \varLambda _{\text {sc},\rho +\varepsilon ,{\mathbf {b}}} \bigg ) \subset \varphi _{t_0}\big (S^*(B_\rho \setminus B_{\rho _0})\big ). \end{aligned}$$
(9.32)

(Since \(S^*A \cap \varGamma _+\) is contained in the left-hand side of (9.32) by (9.28), (9.32) says that the forward flowout of \(B_\rho \setminus B_{\rho _0}\) in time \(t_0\) covers all points in \(S^*A\) that are ever reached by flowout from \(T^* \varOmega _{sc}\).) Outside \(\varOmega _{\mathrm{sc}}\) the flow has speed 2 and its spatial projections are straight lines. Therefore (9.31) is ensured if \(t_0< (\rho _0-R_{\mathrm{sc}})/2\), which is ensured by (9.29). \(\square \)

Fig. 2
figure 2

Figure showing the lengths \(L_1\) and \(L_2\) defined by (9.34)

We now show that (9.32) holds. Since

$$\begin{aligned} ({\mathbf {x}}, {\mathbf {b}}) = ({\mathbf {x}}-2t_0 {\mathbf {b}}+ 2t_0{\mathbf {b}}, {\mathbf {b}}) = \varphi _{t_0}({\mathbf {x}}-2t_0 {\mathbf {b}}, {\mathbf {b}}), \end{aligned}$$

(9.32) follows from showing that \(({\mathbf {x}}-2t_0 {\mathbf {b}}, {\mathbf {b}}) \in S^*(B_\rho \setminus B_{\rho _0})\), i.e. \({\mathbf {x}}-2t_0 {\mathbf {b}}\in B_\rho \setminus B_{\rho _0}\), for all \(({\mathbf {x}}, {\mathbf {b}})\) belonging to the left-hand side of (9.32). For such \(({\mathbf {x}}, {\mathbf {b}})\), by definition,

$$\begin{aligned} \rho \le |{\mathbf {x}}| \le \rho + \epsilon , \text { and }{\mathbf {x}}- s {\mathbf {b}}\in \varOmega _{\mathrm{sc}}\end{aligned}$$
(9.33)

for some \(s \ge 0\). We now claim that for such \(({\mathbf {x}},{\mathbf {b}})\),

$$\begin{aligned} {\mathbf {x}}- \ell {\mathbf {b}}\in B_\rho \setminus B_{\rho _0} \quad \text { for all }\quad L_1 < \ell \le L_2, \end{aligned}$$

where

$$\begin{aligned} L_1 := \sqrt{(\rho +\varepsilon )^2-R_{\mathrm{sc}}^2} - \sqrt{\rho ^2-R_{\mathrm{sc}}^2}, \qquad L_2 := \rho - \rho _0. \end{aligned}$$
(9.34)

This is because, on the one hand, a ray of length \(>L_1\) starting from a point \({\mathbf {x}}\) in a direction \(-{\mathbf {b}}\), with \(({\mathbf {x}}, {\mathbf {b}})\) satisfying (9.33), will automatically enter \(B_\rho \). Indeed, the longest such ray that does not intersect \(B_\rho \) has length \(L_1\), as shown in Fig. 2. On the other hand, a ray of length \(\le L_2\) starting from a point \({\mathbf {x}}\) in a direction \(-{\mathbf {b}}\), with \(({\mathbf {x}}, {\mathbf {b}})\) satisfying (9.33), will not intersect \(B_{\rho _0}\). Indeed, the shortest such ray that enters \(\overline{B_{\rho _0}}\) has length \(L_2\), as shown in Fig. 2. It is then straightforward to check that \(L_1 < 2t_0 \le L_2\) when \(t_0\) is given by (9.29) and \(\varepsilon \) is given by (9.30), so that (9.32) holds.

We now prove the bound (9.26) on \(\mu (S^*A \cap \varGamma _+)\) using (9.31) and (9.32). Because of (9.31), we can use (9.13) to find that

$$\begin{aligned} \mu \big ( \varphi _{ t_0} (S^*(B_{\rho } \setminus B_{\rho _0}) \big ) = \mu (S^*(B_{\rho } \setminus B_{\rho _0})); \end{aligned}$$

using this with (9.28) and (9.32), we obtain (9.26), and thus (9.24).

Proof of (9.25)

Using Lemma 9.8 and the structure of \(\mu _I\), we have

$$\begin{aligned} \mu (T^*\varOmega _\rho )&\ge \mu (T^*\varOmega _\rho \setminus \varGamma _+) = \mu _I(T^*\varOmega _\rho \setminus \varGamma _+) \nonumber \\&=\mu _I\big ((T^*\varOmega _\rho \setminus \varGamma _+)\cap \{ \varvec{\xi }={\mathbf {a}}\}\big ) + \mu _I\big ((T^*\varOmega _\rho \setminus \varGamma _+)\cap \{ \varvec{\xi }\ne {\mathbf {a}}\}\big )\nonumber \\&=\Big | \pi _{\mathbf {x}}\Big ( (T^*\varOmega _\rho \setminus \varGamma _+)\cap \{ \varvec{\xi }={\mathbf {a}}\} \Big )\Big |. \end{aligned}$$
(9.35)

Since

$$\begin{aligned} \pi _{\mathbf {x}}\Big ( (T^*\varOmega _\rho \setminus \varGamma _+)\cap \{ \varvec{\xi }={\mathbf {a}}\} \Big ) \cup \pi _{\mathbf {x}}\Big (( T^*\varOmega _\rho \cap \varGamma _+)\cap \{ \varvec{\xi }={\mathbf {a}}\} \Big ) \supset \varOmega _\rho . \end{aligned}$$

we obtain

$$\begin{aligned} \Big | \pi _{\mathbf {x}}\Big ( (T^*\varOmega _\rho \setminus \varGamma _+)\cap \{ \varvec{\xi }={\mathbf {a}}\} \Big )\Big |\ge |\varOmega _\rho | - \Big | \pi _{\mathbf {x}}\Big (( T^*\varOmega _\rho \cap \varGamma _+)\cap \{\varvec{\xi }={\mathbf {a}}\} \Big )\Big |.\nonumber \\ \end{aligned}$$
(9.36)

By the first inclusion in (9.27),

$$\begin{aligned} \big |\pi _{\mathbf {x}}\big ( (T^*\varOmega _\rho \cap \varGamma _+)\cap \{ \varvec{\xi }={\mathbf {a}}\} \big ) \big | \le \big |\varOmega _{\mathrm{sc}, R, {\mathbf {a}}}\big |, \end{aligned}$$
(9.37)

with this inequality expressing the fact that any parts of the scattered wave travelling in direction \({\mathbf {a}}\) must lie in \(\varOmega _{\mathrm{sc}, R, {\mathbf {a}}}\). Combining (9.36) with (9.37) yields

$$\begin{aligned} \Big | \pi _{\mathbf {x}}\big ( (T^*\varOmega _\rho \setminus \varGamma _+)\cap \{ \varvec{\xi }={\mathbf {a}}\} \big )\Big |\ge |\varOmega _\rho | - |\varOmega _{\mathrm{sc}, R, {\mathbf {a}}}|. \end{aligned}$$
(9.38)

Since \( \varOmega _{\mathrm{sc}, R, {\mathbf {a}}}\subsetneq \varOmega _\rho \), there exists \(\delta >0\) such that \(|\varOmega _\rho | - |\varOmega _{\mathrm{sc}, R, {\mathbf {a}}}| \ge \delta |\varOmega _\rho |\), and thus (9.35) and (9.38) imply that (9.25) holds; the proof is complete. \(\square \)

Remark 9.9

(What if impedance boundary conditions are imposed on \({\varGamma _R}\)?) If the impedance boundary condition \(\partial _{\mathbf {n}}u^S - \mathrm{i}k u^S=0\) is imposed on \({\varGamma _R}\) (as an approximation of \(\mathrm{DtN}_k\)), then there are additional reflections on \({\varGamma _R}\) [84, 46, §2] \(\mu ^S\) has support on the incoming set, and Lemma 9.8 no longer holds.

Remark 9.10

(Proving Theorem 9.1in the trapping case) In the trapping case, \(\Vert u(k)\Vert _{L^2(\varOmega _R)}\) may no longer be uniformly bounded, as it is in Lemma 9.4, since (3.5) no longer holds with \({C_{\mathrm{sol}}}\) bounded independently of k. If a subsequence of k’s exists along which \(\Vert u(k)\Vert _{L^2(\varOmega _R)}\) is uniformly bounded, we may obtain a contradiction by the same argument as above by considering this subsequence. Thus, we can assume, without loss of generality, that \(\Vert u(k)\Vert _{L^2(\varOmega _R)} \rightarrow \infty .\) Now instead of defining defect measures of u(k), one can instead define defect measures of \(u(k)/\Vert u(k)\Vert _{L^2(\varOmega _R)}\). If R is sufficiently large, then the bound in [19, Theorem 1.1] (i.e. the fact that the nontrapping cut-off resolvent estimate holds, even under trapping, if the supports of the cut-offs on both sides are sufficiently far away from the scatterer) implies that \(v(k):= u(k)/\Vert u(k)\Vert _{L^2(\varOmega _R)}\) satisfies (9.6). Any defect measure of v(k) is then immediately non-zero, since \(\mu (\chi ^2) \ge 1\) for any \(\chi \) with \(\mathrm{supp}\chi \supset B_R\). Lemma 9.7 goes through as before after multiplying both sides of (9.19) by \(\Vert u(k)\Vert _{L^2(\varOmega _R)}^{-2}\). The main change needed to the rest of the proof is to take into account the fact that a defect measure of \(u^I(k)/\Vert u(k)\Vert _{L^2(\varOmega _R)}\) is zero when \(\Vert u(k)\Vert _{L^2(\varOmega _R)}\) grows through the sequence \(k_\ell \) associated with that measure. In this situation, however, the bound (9.23) becomes \(\mu (T^* A) \le \mu (T^*A \cap \varGamma _+)\); combining this with (9.24) we obtain \(\mu (T^* A) \le 2\mu (T^*\varOmega _R)\), from which the key bound (9.21) (and hence the result of the theorem) follows.

10 Proof of Theorems 4.1 and 4.2

Lemma 10.1

(Aubin-Nitsche analogue via elliptic projection) Assuming that the Galerkin solution \(u_h\) to the variational problem (2.11) exists, if

$$\begin{aligned} hk^2 \eta ({{\mathcal {H}}}_h) \le {{\mathcal {C}}}_1, \quad \text { where }\quad {{\mathcal {C}}}_1:= \frac{1}{2\sqrt{2} {C_{\mathrm{cont}}}_\star {C_{H^2}}_\star {C_{\mathrm{int}}}n_{\max }}, \end{aligned}$$
(10.1)

then

$$\begin{aligned} \left\| u-u_h\right\| _{L^2(\varOmega _R)} \le 2 {C_{\mathrm{cont}}}_\star \eta ({{\mathcal {H}}}_h) \left\| u-w_h\right\| _{H^1_k(\varOmega _R)} \quad \text { for all }w_h \in {{\mathcal {H}}}_h. \end{aligned}$$

Proof

Let \(\xi = {{\mathcal {S}}}^*(u-u_h)\); i.e. \(\xi \) is the solution of variational problem

$$\begin{aligned} \text { find }\xi \in {{\mathcal {H}}}\text { such that }a(v,\xi )= (v,u-u_h)_{L^2(\varOmega _R)} \quad \text { for all }v\in {{\mathcal {H}}}. \end{aligned}$$

Then, by Galerkin orthogonality (7.6) and the definition of \(a_\star (\cdot ,\cdot )\) (7.1), for all \(v_h \in {{\mathcal {H}}}_h\),

$$\begin{aligned} \left\| u-u_h\right\| ^2_{L^2(\varOmega _R)}&= a( u-u_h, \xi ) = a( u-u_h,\xi -v_h ), \nonumber \\&= a_\star (u-u_h, \xi -v_h)_{L^2(\varOmega _R)} -k^2 (n(u-u_h), \xi -v_h)_{L^2(\varOmega _R)}. \end{aligned}$$
(10.2)

We choose \(v_h= {{\mathcal {P}}}_h\xi \), and then use (in the following order) (i) the Galerkin orthogonality (7.6), (ii) continuity of \(a_\star (\cdot ,\cdot )\), (iii) the bound (7.8), (iv) the upper bound in the norm equivalence (7.4) and the bound (7.7), and (v) the consequence (8.4) of the definition of \(\eta \) to obtain that, for all \(w_h\in {{\mathcal {H}}}_H\),

$$\begin{aligned}&\left\| u-u_h\right\| ^2_{L^2(\varOmega _R)} = a_\star (u-w_h, \xi -{{\mathcal {P}}}_h \xi )_{L^2(\varOmega _R)} -k^2 (n(u-u_h), \xi - {{\mathcal {P}}}_h\xi )_{L^2(\varOmega _R)}\nonumber \\&\le \left\| u-w_h\right\| _{\star } \left\| \xi -{{\mathcal {P}}}_h \xi \right\| _{\star } + k^2 n_{\max }\left\| u-u_h\right\| _{L^2(\varOmega _R)}\left\| \xi -{{\mathcal {P}}}_h \xi \right\| _{L^2(\varOmega _R)}\nonumber \\&\le \Big ( \left\| u-w_h\right\| _\star + hk^2 \sqrt{2}{C_{\mathrm{int}}}{C_{H^2}}_\star \sqrt{{C_{\mathrm{cont}}}_\star }n_{\max } \left\| u-u_h\right\| _{L^2(\varOmega _R)}\Big ) \left\| \xi -{{\mathcal {P}}}_h \xi \right\| _\star \nonumber \\&\le \Big ( \sqrt{{C_{\mathrm{cont}}}_\star } \left\| u-w_h\right\| _{H^1_k} + hk^2 \sqrt{2} {C_{\mathrm{int}}}{C_{H^2}}_\star \sqrt{{C_{\mathrm{cont}}}_\star }n_{\max }\left\| u-u_h\right\| _{L^2}\Big )\nonumber \\&\quad \times \sqrt{{C_{\mathrm{cont}}}_\star } \min _{v_h \in {{\mathcal {H}}}_h} \left\| \xi -v_h\right\| _{H^1_k(\varOmega _R)}\nonumber \\&\le \Big ( \sqrt{{C_{\mathrm{cont}}}_\star } \left\| u-w_h\right\| _{H^1_k} + hk^2 \sqrt{2} {C_{\mathrm{int}}}{C_{H^2}}_\star \sqrt{{C_{\mathrm{cont}}}_\star }n_{\max }\left\| u-u_h\right\| _{L^2}\Big )\nonumber \\&\quad \times \sqrt{{C_{\mathrm{cont}}}_\star }\eta ({{\mathcal {H}}}_h)\left\| u-u_h\right\| _{L^2(\varOmega _R)}; \end{aligned}$$
(10.3)

the result then follows. \(\square \)

Remark 10.2

(Advantage of elliptic-projection over standard duality argument) Comparing (10.2) and (10.3) we see the advantage of the elliptic-projection argument over the standard duality argument: in (10.3), Galerkin orthogonality for \(a_\star (\cdot ,\cdot )\) has allowed us to obtain \(u-w_h\) (with \(w_h\) arbitrary) as opposed to \(u-u_h\) in the first argument of the sesquilinear form on the right-hand side, leading to the bound (5.3) instead of (5.2). The price for this is that we have an additional \(L^2\) inner product on the right-hand side of (10.3), and controlling this leads to the condition (10.1).

Recall that, by the Cauchy–Schwarz inequality and the inequality (3.3), \(a(\cdot ,\cdot )\) is continuous, i.e., for all \(u,v \in {{\mathcal {H}}}\),

$$\begin{aligned} \big |a(u,v)\big | \le {C_{\mathrm{cont}}}\left\| u\right\| _{H^1_k(\varOmega _R)} \left\| v\right\| _{H^1_k(\varOmega _R)}, \end{aligned}$$
(10.4)

where \({C_{\mathrm{cont}}}:= \max \big \{A_{\max }, n_{\max }\big \} + {C_{\mathrm{DtN}}}_1\).

Lemma 10.3

Assuming that the Galerkin solution \(u_h\) to the variational problem (2.11) exists, if (10.1) holds, then

$$\begin{aligned} \left\| u-u_h\right\| _{H^1_k(\varOmega _R)} \le \Big ({{\mathcal {C}}}_2 hk + {{\mathcal {C}}}_3 h k^2 \eta ({{\mathcal {H}}}_h)\Big ) \left\| u\right\| _{H^1_k(\varOmega _R)}, \end{aligned}$$
(10.5)

where

$$\begin{aligned} {{\mathcal {C}}}_2:= \frac{\sqrt{2} {C_{\mathrm{cont}}}{C_{\mathrm{int}}}C_{\mathrm{{osc}}}}{A_{\min }}\quad \text { and }\quad {{\mathcal {C}}}_3:= \frac{4{C_{\mathrm{cont}}}_\star {C_{\mathrm{int}}}C_{\mathrm{{osc}}}\sqrt{n_{\max }+ A_{\min }}}{\sqrt{A_{\min }}}. \end{aligned}$$

Proof

Since \(\mathrm{DtN}_k\) satisfies the inequality (3.4), and \({\mathsf {A}}\) and n satisfy the inequalities (2.1) and (2.2), \(a(\cdot ,\cdot )\) (2.8) satisfies the Gårding inequality

$$\begin{aligned} \mathfrak {R}a(v,v) \ge A_{\min } \left\| v\right\| ^2_{H^1_k(\varOmega _R)}- k^2 (n_{\max }+ A_{\min }) \left\| v\right\| ^2_{L^2(\varOmega _R)}. \end{aligned}$$
(10.6)

Using Galerkin orthogonality (2.12) and continuity of \(a(\cdot ,\cdot )\) (10.4), we find that that (5.1) holds for any \(v_h\in {{\mathcal {H}}}_h\). Using first the inequality (5.4) with \(\alpha =\Vert u-u_h\Vert _{H^1_k(\varOmega _R)}\), \(\beta ={C_{\mathrm{cont}}}\Vert u-v_h\Vert _{H^1_k(\varOmega _R)}\), \(\varepsilon = A_{\min }\), and then Lemma 10.1, we find that if (10.1) holds, then, for any \(v_h\in {{\mathcal {H}}}_h\),

$$\begin{aligned}&\frac{A_{\min }}{2}\left\| u-u_h\right\| ^2_{H^1_k(\varOmega _R)} \nonumber \\&\quad \le \frac{({C_{\mathrm{cont}}})^2}{2 A_{\min }} \left\| u-v_h\right\| ^2_{H^1_k(\varOmega _R)} + k^2 \big (n_{\max } + A_{\min }\big ) \left\| u-u_h\right\| ^2_{L^2(\varOmega _R)}\nonumber \\&\quad \le \left[ \frac{({C_{\mathrm{cont}}})^2}{2 A_{\min }} + 4k^2 \big (n_{\max } + A_{\min }\big ) ({C_{\mathrm{cont}}}_\star )^2 \big (\eta ({{\mathcal {H}}}_h)\big )^2 \right] \left\| u-v_h\right\| ^2_{H^1_k(\varOmega _R)}, \end{aligned}$$
(10.7)

By the consequence (3.11) of the definition of \({C_{\mathrm{int}}}\) and the bound (3.6)/(9.1),

$$\begin{aligned} \left\| u-I_h u\right\| _{H^1_k(\varOmega _R)} \le \sqrt{2}h {C_{\mathrm{int}}}|u|_{H^2(\varOmega _R)}\le \sqrt{2}hk {C_{\mathrm{int}}}C_{\mathrm{{osc}}}\left\| u\right\| _{H^1_k(\varOmega _R)}. \end{aligned}$$
(10.8)

Choosing \(v_h = I_h u\) in (10.7), using (10.8), taking the square root and using the inequality \(\sqrt{a^2+ b^2}\le a+b\) for all \(a,b>0\), we find the result (10.5). \(\square \)

Proof of (9.25)

(Proof of Theorem 4.1) Under the assumption that the Galerkin solution \(u_h\) exists, the fact that the bound (4.2) holds under the condition (4.1) follows from combining Lemma 10.3 with the bound (8.5) on \(\eta \). To prove that \(u_h\) exists under the condition (4.1), recall that, since the variational problem (2.11) is equivalent to a linear system of equations in a finite-dimensional space, existence of a solution follows from uniqueness. Suppose that there exists a \({\widetilde{u}}_h\in {{\mathcal {H}}}_h\) such that \(a({\widetilde{u}}_h,v_h)=0\) for all \(v_h\in {{\mathcal {H}}}_h\); to prove uniqueness, we need to show that \({\widetilde{u}}_h=0\). Let \({\widetilde{u}}\) be such that \(a({\widetilde{u}},v)=0\) for all \(v\in {{\mathcal {H}}}\), so that \({\widetilde{u}}_h\) is the Galerkin approximation to \({\widetilde{u}}\). Repeating the argument in the first part of the proof we see that the condition (4.1) holds then the bound (4.2) holds (with u replaced by \({\widetilde{u}}\) and \(u_h\) replaced by \({\widetilde{u}}_h\)). By Lemma 2.4, \({\widetilde{u}}=0\), so (4.2) implies that \({\widetilde{u}}_h=0\) and the proof is complete. \(\square \)

Proof of (9.25)

(Proof of Theorem 4.2) This is very similar to the proof of Theorem 4.1, except that we use the bound (8.6) on \(\eta ({{\mathcal {H}}}_h)\) instead of (8.5). \(\square \)