An optimal adaptive wavelet method for first order system least squares
 293 Downloads
Abstract
In this paper, it is shown that any wellposed 2nd order PDE can be reformulated as a wellposed first order least squares system. This system will be solved by an adaptive wavelet solver in optimal computational complexity. The applications that are considered are second order elliptic PDEs with general inhomogeneous boundary conditions, and the stationary Navier–Stokes equations.
Mathematics Subject Classification
41A25 42C40 47J25 65J15 65N12 65T60 65N30 76D051 Introduction
In this paper, a wavelet method is constructed for the optimal adaptive solution of stationary PDEs. We develop a general procedure to write any wellposed 2nd order PDE as a wellposed first order least squares system. The (natural) least squares formulations contain dual norms, that, however, impose no difficulties for a wavelet solver. The advantages of the first order least squares system formulation are twofold.
Firstly, regardless of the original problem, the least squares problem is symmetric and positive definite, which opens the possibility to develop an optimal adaptive solver. The obvious use of the leastsquares functional as an a posteriori error estimator, however, is not known to yield a convergent method (see, however, [16] for an alternative for Poisson’s problem). As we will see, the use of the (approximate) residual in wavelet coordinates as an a posteriori error estimator does give rise to an optimal adaptive solver.
Secondly, as we will discuss in more detail in the following subsections, the optimal application of a wavelet solver to a first order system reformulation allows for a simpler, and quantitatively more efficient approximate residual evaluation than with the standard formulation of second order. Moreover, it applies equally well to semilinear equations, as e.g. the stationary Navier–Stokes equations, and it applies to wavelets that have only one vanishing moment.
The approach to apply the wavelet solver to a wellposed first order least squares system reformulation also applies to timedependent PDEs in simultaneous spacetime variational formulations, as parabolic problems or instationary Navier–Stokes equations. With those problems, the wavelet bases consist of tensor products of temporal and spatial wavelets. Consequently, they require a different procedure for the approximate evaluation of the residual in wavelet coordinates, which will be the topic of a forthcoming work.
1.1 Adaptive wavelet schemes, and the approximate residual evaluation
Adaptive wavelet schemes can solve wellposed linear and nonlinear operator equations at the best possible rate allowed by the basis in linear complexity [7, 8, 9, 29, 31, 32, 34]. Schemes with those properties will be called optimal. The schemes can be applied to PDEs, which we study in this work, as well as to integral equations [18].
There are two kinds of adaptive wavelet schemes. One approach is to apply some convergent iterative method to the infinite system in wavelet coordinates, with decreasing tolerances for the inexact evaluations of residuals [8, 9]. These schemes rely on the application of coarsening to achieve optimal rates.
The other approach is to solve a sequence of Galerkin approximations from spans of nested sets of wavelets. The (approximate) residual in wavelet coordinates of the current approximation is used as an a posteriori error estimator to make an optimal selection of the wavelets to be added to form the next set [7]. With this scheme, that is studied in the current work, the application of coarsening can be avoided [23, 34], and it turns out to be quantitatively more efficient. This approach is restricted to PDOs whose Fréchet derivatives are symmetric and positive definite (compact perturbations can be added though, see [22]).
A key computational ingredient of both schemes is the approximate evaluation of the residual in wavelet coordinates. Let us discuss this for a linear operator equation \(A u =f\), with, for some separable Hilbert spaces \(\mathscr {H}\) and \(\mathscr {K}\), for convenience over \(\mathbb {R}\), \(f \in \mathscr {K}'\) and \(A \in \mathcal {L}\mathrm {is}(\mathscr {H},\mathscr {K}')\) (i.e., \(A \in \mathcal {L}(\mathscr {H},\mathscr {K}')\) and \(A^{1} \in \mathcal {L}(\mathscr {K}',\mathscr {H})\)).
Equipping \(\mathscr {H}\) and \(\mathscr {K}\) with Riesz bases \(\Psi ^\mathscr {H}\), \(\Psi ^\mathscr {K}\), formally viewed as column vectors, \(A u =f\) can be equivalently written as a biinfinite system of coupled scalar equations \(\mathbf{A} \mathbf{u}=\mathbf{f}\), where \(\mathbf{f}=f(\Psi ^\mathscr {K})\) is the infinite ‘load vector’, \(\mathbf{A}=(A \Psi ^\mathscr {H})(\Psi ^\mathscr {K})\) is the infinite ‘stiffness’ or system matrix, and \(u=\mathbf{u}^\top \Psi ^\mathscr {H}\).
Here we made use of following notations:
Notation 1.1
For countable collections of functions \(\Sigma \) and \(\Upsilon \), we write \(g(\Sigma )=[g(\sigma )]_{\sigma \in \Sigma }\), \(M(\Sigma )(\Upsilon )=[M(\sigma )(\upsilon )]_{\upsilon \in \Upsilon , \sigma \in \Sigma }\), and \(\langle \Upsilon , \Sigma \rangle = [\langle \upsilon , \sigma \rangle ]_{\upsilon \in \Upsilon , \sigma \in \Sigma }\), assuming g, M, and \(\langle \,,\,\rangle \) are such that the expressions at the righthand sides are welldefined.
The space of square summable vectors of reals indexed over a countable index set \(\vee \) will be denoted as \(\ell _2(\vee )\) or simply as \(\ell _2\). The norm on this space will be simply denoted as \(\Vert \cdot \Vert \).
As a consequence of \(A \in \mathcal {L}\mathrm {is}(\mathscr {H},\mathscr {K}')\), we have that \(\mathbf{A} \in \mathcal {L}\mathrm {is}(\ell _2,\ell _2)\). For the moment, let us additionally assume that \(\mathbf{A}\) is symmetric and positive definite, as when \(\mathscr {K}=\mathscr {H}\), \((Au)(v)= (Av)(u)\) and \((Au)(u) \gtrsim \Vert u\Vert _\mathscr {H}^2\) (\(u,v \in \mathscr {H}\)). If this is not the case, then the following can be applied to the normal equations \(\mathbf{A}^\top \mathbf{A} \mathbf{u}=\mathbf{A}^\top \mathbf{f}\).
The standard way to approximate the residual within tolerance \(\varepsilon \) is to approximate both \(\mathbf{f}\) and \(\mathbf{A} {\tilde{\mathbf{u}}}\) separately within tolerance \(\varepsilon /2\). Under reasonable assumptions, \(\mathbf{f}\) can be approximated within tolerance \(\varepsilon /2\) by a vector of length \({\mathcal {O}}(\varepsilon ^{1/s})\).
For the approximation of \(\mathbf{A} {\tilde{\mathbf{u}}}\), it is used that, thanks to the properties of the wavelets as having vanishing moments, each column of \(\mathbf{A}\), although generally infinitely supported, can be well approximated by finitely supported vectors. In the approximate matrixvector multiplication routine introduced in [7], known as the APPLYroutine, the accuracy with which a column is approximated is judiciously chosen depending on the size of the corresponding entry in the input vector \({\tilde{\mathbf{u}}}\). It has been shown to realise a tolerance \(\varepsilon /2\) at cost \({\mathcal {O}}(\varepsilon ^{1/s}{\tilde{\mathbf{u}}}_{\mathcal {A}^{s}}^{1/s}+\#\,\mathrm{supp}\,{\tilde{\mathbf{u}}})\), for any s in some range \((0,s^*]\). For wavelets that have sufficiently many vanishing moments, this range was shown to include the range of \(s \in (0,s_{\max }]\) for which, in view of the order of the wavelets, \(\mathbf{u} \in \mathcal {A}^s\) can possibly be expected (cf. [28]). Using that for the approximations \({\tilde{\mathbf{u}}}\) to \(\mathbf{u}\) that are generated inside the adaptive wavelet scheme, it holds that \({\tilde{\mathbf{u}}}_{\mathcal {A}^{s}} \lesssim \mathbf{u}_{\mathcal {A}^{s}}\), in those cases the cost condition is satisfied, and so the adaptive wavelet scheme is optimal.
The APPLYroutine, however, is quite difficult to implement. Note, in particular, that its outcome depends nonlinearly on the input vector \({\tilde{\mathbf{u}}}\). Furthermore, in experiments, the routine turns out to be quantitatively expensive. Finally, although it has been generalized to certain classes of semilinear PDEs, in those cases it has not been shown that \(s^* \ge s_{\max }\), meaning that for nonlinear problems the issue of optimality is actually open.
1.2 An alternative for the APPLY routine
By selecting ‘single scale’ collections \(\Phi ^\mathscr {H}\) and \(\Phi ^\mathscr {K}\) with \({{\mathrm{span}}}\, \Phi ^\mathscr {H}\supseteq {{\mathrm{span}}}\,\Psi ^\mathscr {H}_{\Lambda ^\mathscr {H}}\) and \({{\mathrm{span}}}\,\Phi ^\mathscr {K}\supseteq {{\mathrm{span}}}\,\Psi ^\mathscr {K}_{\Lambda ^\mathscr {K}}\), and \(\# \Phi ^\mathscr {H}\lesssim \# \Lambda ^\mathscr {H}\) and \(\# \Phi ^\mathscr {K}\lesssim \# \Lambda ^\mathscr {K}\), this approximate residual \(\mathbf{r}_{\Lambda ^\mathscr {K}}\) can be computed in \({\mathcal {O}}(\Lambda ^\mathscr {K})\) operations as follows: First express \(\tilde{u}\) in terms of \(\Phi ^\mathscr {H}\) by applying a multitosingle scale transformation to \({\tilde{\mathbf{u}}}\), then apply to this representation the sparse stiffness matrix \(\langle (\Phi ^\mathscr {K})',(\Phi ^\mathscr {H})' \rangle _{L_2(0,1)}\), subtract \(\langle \Phi ^\mathscr {K},f \rangle _{L_2(0,1)}\), and finally apply the transpose of the multitosingle scale transformation involving \(\Psi ^\mathscr {K}_{\Lambda ^\mathscr {K}}\) and \(\Phi ^\mathscr {K}\). This approximate residual evaluation thus satisfies the cost condition for optimality, it is relatively easy to implement, and it is observed to be quantitatively much more efficient.
It furthermore generalizes to semilinear operators, in any case for nonlinear terms that are multivariate polynomials in u and derivatives of u. Indeed, as an example, suppose that instead of \(u''=f\) the equation reads as \(u'' +u^3=f\). Then the residual is given by \(\langle \Psi ^\mathscr {K},f+\tilde{u}''\tilde{u}^3 \rangle _{L_2(0,1)}\). Since \(f_\varepsilon +\tilde{u}''\tilde{u}^3\) is a piecewise polynomial w.r.t. \(\mathcal {T}\), the same arguments shows that \(\langle \Psi ^\mathscr {K},f+\tilde{u}''\tilde{u}^3 \rangle _{L_2(0,1)}\big _{\Lambda ^\mathscr {K}}\) is a valid approximate residual.
The essential idea behind our approximate residual evaluation is that, after the replacement of f by \(f_\varepsilon \), the different terms that constitute the residual are expressed in a common dictionary, before the residual, as a whole, is integrated against \(\Psi ^\mathscr {K}\). In our simple onedimensional example this was possible by selecting \(\Psi ^\mathscr {H}\subset H^2(0,1)\), so that the operator could be applied to the wavelets in strong, or more precisely, mild sense, meaning that the result of the application lands in \(L_2(0,1)\). It required piecewise smooth, globally \(C^1\)wavelets. Although the same approach applies in more dimensions, there, except on product domains, the construction of \(C^1\)wavelet bases is cumbersome. For that reason, our approach will be to write a PDE of second order as a system of PDEs of first order. It will turn out that there are several possibilities to do so.
1.3 A common first order system least squares formulation
Remark 1.2
Recall that, as always with least squares formulations, the same results are valid when lower order, possibly nonsymmetric terms are added to the second order PDE, as long as the standard variational formulation remains wellposed. Furthermore, as we will discuss, least squares formulations allows to handle inhomogeneous boundary conditions. Finally, as we will see, the approach of reformulating a 2nd order PDE as a first order least squares problem, and then optimally solving the normal equations applies to any wellposed PDE, not necessarily being elliptic.
In [17] we applied the adaptive wavelet scheme to a least squares formulation of the current, common type. Disadvantages of this formulation are that (i) it requires that \(f \in L_2(0,1)\), instead of \(f \in H^{1}(0,1)\) as allowed in the standard variational formulation. Related to that, and more importantly, for a semilinear equation \(u''+N(u)=f\), (ii) it is needed that N maps \(H^1_0(0,1)\) into \(L_2(0,1)\), instead of into \(H^{1}(0,1)\). Finally, with the generalization of this least squares formulation to more than one space dimensions, (iii) the space \(H^1(0,1)\) for \(\theta \) reads as \(H({{\mathrm{div}}};\Omega )\). In [17], for twodimensional connected polygonal domains \(\Omega \), we managed to construct a wavelet Riesz basis for \(H({{\mathrm{div}}};\Omega )\). This construction, however, relied on the fact that, in two dimensions, any divergencefree function is the curl of an \(H^1\)function. To the best of our knowledge, wavelet Riesz bases for \(H({{\mathrm{div}}};\Omega )\) for nonproduct domains in three and more dimensions have not been constructed.
In the next subsection, we describe a prototype of a leastsquares formulation with which these disadvantages (i)–(iii) are avoided.
1.4 A seemingly unpractical least squares formulation
To apply the adaptive wavelet scheme to these normal equations, we equip \(H^1_0(0,1)\) and \(L_2(0,1)\) with wavelet Riesz bases \(\Psi ^{H^1_0}\) and \(\Psi ^{L_2}\), respectively. When these bases have order \(p+1\) and p, the best possible convergence rate \(s_{\max }\) will be equal to p (p / n on an ndimensional domain). Note that the order of the basis \(\Psi ^{\hat{H}^1_0}\) is irrelevant.
As we will see in Sect. 2, the advantage of the current construction of a first order system least squares problem is that it applies to any wellposed (semilinear) second order PDE. The two instances of the spaces \(H^1_0(0,1)\) represent the trial and test spaces in the standard variational formulation, and wellposedness of the latter implies wellposedness of the least squares formulation. The additional space \(L_2(0,1)\) reads in general as an \(L_2\)type space. So, in particular, with this formulation, \(H({{\mathrm{div}}})\)spaces do not enter. The price to be paid is that (1.5) is somewhat more complicated than (1.2), and that therefore its approximation is somewhat more costly to compute.
Remark 1.3
The more popular ‘dual’ mixed formulation of our model problem reads as finding \((u,\theta ) \in L_2(0,1) \times H^1(0,1)\) such that \(\langle \theta ',v\rangle _{L_2(0,1)}+\langle \theta ,\eta \rangle _{L_2(0,1)}+\langle u,\eta '\rangle _{L_2(0,1)}=\langle f,v\rangle _{L_2(0,1)}\) (\((v,\eta ) \in L_2(0,1) \times H^1(0,1)\)). The resulting least squares formulation has the combined disadvantages of both other formulations that we considered. It requires \(f \in L_2(0,1)\), possibly nonlinear terms should map into \(L_2(0,1)\), in more than one dimension the space \(H^1(0,1)\) reads as an \(H({{\mathrm{div}}})\)space, and one of the norms involved in the least squares minimalisation is a dual norm.
Remark 1.4
With the aim to avoid both a dual norm in the least squares minimalisation, and \(H({{\mathrm{div}}})\) or other vectorial Sobolev spaces as trial spaces, in our first investigations of this least squares approach in [31], we considered the ‘extended divgrad’ first order system least squares formulation studied in [14]. A sufficient and necessary [30], but restrictive condition for its wellposedness is \(H^2\)regularity of the homogeneous boundary value problem.
1.5 Layout of the paper
In Sect. 2, a general procedure is given to reformulate any wellposed semilinear 2nd order PDE as a wellposed first order least squares problem. As we will see, this procedure gives an effortless derivation of wellposed first order least squares formulations of elliptic 2nd order PDEs, and that of the stationary Navier–Stokes equations. The arising dual norm can be replaced by the equivalent \(\ell _2\)norm of a functional in wavelet coordinates.
In Sect. 3, we recall properties of the adaptive wavelet Galerkin method (awgm). Operator equations of the form \(F(z)=0\), where, for some Hilbert space \(\mathscr {H}\), \(F:\mathscr {H}\rightarrow \mathscr {H}'\) and DF(z) is symmetric and positive definite, are solved by the awgm at the best possible rate from a Riesz basis for \(\mathscr {H}\). Furthermore, under a condition on the cost of the approximate residual evaluations, the method has optimal computational complexity.
In the short Sect. 4, it is shown that the awgm applies to the normal equations that result from the first order least squares problems as derived in Sect. 2.
In Sect. 5, we apply the awgm to a first order least squares formulation of a semilinear 2nd order elliptic PDE with general inhomogeneous boundary conditions. Under a mild condition on the wavelet basis for the trial space, the efficient approximate residual evaluation that was outlined in Sect. 1.4 applies, and it satisfies the cost condition, so that the awgm is optimal. Wavelet bases that satisfy the assumptions are available on general polygonal domains. Some technical results needed for this section are given in “Appendix A”.
In Sect. 6 the findings from Sect. 5 are illustrated by numerical results.
In Sect. 7, we consider the socalled velocity–pressure–velocity gradient and the velocity–pressure–vorticity first order system formulations of the stationary Navier–Stokes equations. Results analogously to those demonstrated for the elliptic problem will be shown to be valid here as well.
2 Reformulation of a semilinear second order PDE as a first order system least squares problem
In an abstract framework, we give a procedure to write semilinear second order PDEs, that have wellposed standard variational formulations, as a wellposed first order system least squares problems. A particular instance of this approach has been discussed in Sect. 1.4.
Remark 2.1
In applications G is the operator associated to a variational formulation of a PDO with trial space \(\mathscr {U}\) and test space \(\mathscr {V}\).
Remark 2.2
In applications, as those discussed in Sects. 5 and 7, \(G_1 G_2\) will be a factorization of the leading second order part of the PDO (possibly modulo terms that vanish at the solution, cf. Sect. 7.2) into a product of first order PDOs.
The following lemma shows that wellposedness of the original formulation implies that of the reformulation as a system.
Lemma 2.3
Remark 2.4
Since \(\mathrm{ran}\,DG(u)=\mathscr {V}'\) iff \(\mathrm{ran}\,D\vec {H}(u,\theta )=\mathscr {V}'\times \mathscr {T}\), in particular we have that \(DG(u) \in \mathcal {L}\mathrm {is}(\mathscr {U},\mathscr {V}')\) implies that \(D\vec {H}(u,\theta ) \in \mathcal {L}\mathrm {is}(\mathscr {U}\times \mathscr {T},\mathscr {V}' \times \mathscr {T})\).
Proof
 (i)
there exists a solution u of \(G(u)=0\);
 (ii)
G is two times continuously Fréchet differentiable in a neighborhood of u;
 (iii)
\(DG(u) \in \mathcal {L}(\mathscr {U},\mathscr {V}')\) is a homeomorphism with its range.
 (a)
\((u,\theta )=(u,G_2 u)\) solves \(\vec {H}(u,\theta )=\vec {0}\);
 (b)
\(\vec {H}\) is two times continuously Fréchet differentiable in a neighborhood of \((u,\theta )\);
 (c)
\(D\vec {H}(u,\theta ) \in \mathcal {L}(\mathscr {U}\times \mathscr {T},\mathscr {V}' \times \mathscr {T})\) is a homeomorphism with its range,
Remark 2.5
Actually, one might dispute whether these equations should be called wellposed when \(\mathrm{ran}\,DG(u) \subsetneq \mathscr {V}'\) and so \(\mathrm{ran}\,D\vec {H}(u,\theta ) \subsetneq \mathscr {V}' \times \mathscr {T}\). In any case, under conditions (i)–(iii), and so (a)–(c), the corresponding leastsquares problems and resulting (nonlinear) normal equations are wellposed, as we will see next.
Lemma 2.6
Proof
This is a consequence of \(\vec {H}(\tilde{u},\tilde{\theta })=D\vec {H}(u,\theta ) (\tilde{u}u,\tilde{\theta }\theta )+o(\Vert \tilde{u}u\Vert _\mathscr {U}+\Vert \tilde{\theta }\theta \Vert _\mathscr {T})\). \(\square \)
 (1)
DQ is a mapping from a subset of a separable Hilbert space, viz. \(\mathscr {U}\times \mathscr {T}\), to its dual;
 (2)
there exists a solution of \(DQ(u,\theta )=0\) (viz. any solution of \(\vec {H}(u,\theta )=0\));
 (3)
DQ is continuously Fréchet differentiable in a neighborhood of \((u,\theta )\);
 (4)
\(0<D^2 Q(u,\theta )= D^2 Q(u,\theta )' \in \mathcal {L}\mathrm {is}(\mathscr {U}\times \mathscr {T},(\mathscr {U}\times \mathscr {T})')\).
In view of the above findings, in order to solve \(G(u)=0\), for a G that satisfies (i)–(iii), we are going to solve the (nonlinear) normal equations \(DQ(u,\theta )=0\). A major advantage of DQ over G is that its derivative is symmetric and coercive.
Remark 2.7
We refer to [4] for an alternative approach to solve least square problems that involves dual norms.
To solve the obtained (nonlinear) normal equations \(DQ(u,\theta )=0\) we are going to apply the adaptive wavelet Galerkin method (awgm). Note that the definition of \(DQ(u,\theta )(v,\eta )\) still involves an infinite sum over \(\vee _\mathscr {V}\) that later, inside the solution process, is going to be replaced by a finite one.
3 The adaptive wavelet Galerkin method (awgm)
 (I)
\(F:\mathscr {H}\supset \mathrm {dom}(F) \rightarrow \mathscr {H}'\), with \(\mathscr {H}\) being a separable Hilbert space;
 (II)
\(F(z)=0\);
 (III)
F be continuously differentiable in a neighborhood of z;
 (IV)
\(0<DF(z)=DF(z)' \in \mathcal {L}\mathrm {is}(\mathscr {H},\mathscr {H}')\).
In order to be able to construct efficient algorithms, in particular when F is nonaffine, it will be needed to consider only sets \(\Lambda \) from a certain subset of all finite subsets of \(\vee \). In our applications, this collection of socalled admissible \(\Lambda \) will consist of (Cartesian products of) finite trees. For the moment, it suffices when the collection of admissible sets is such that the union of any two admissible sets is again admissible.
The adaptive wavelet Galerkin method (awgm) defined below produces a sequence of increasingly more accurate Galerkin approximations \(\mathbf{z}_\Lambda \) to \(\mathbf{z}\). The, generally, infinite residual \(\mathbf{F}(\mathbf{z}_\Lambda )\) is used as an a posteriori error estimator. A motivation for the latter is given by the following result.
Lemma 3.1
For \(\Vert \mathbf{z}{\tilde{\mathbf{z}}}\Vert \) sufficiently small, it holds that \(\Vert \mathbf{F}({\tilde{\mathbf{z}}})\Vert \eqsim \Vert \mathbf{z}{\tilde{\mathbf{z}}}\Vert \).
Proof
With \(\tilde{z}={\tilde{\mathbf{z}}}^\top \Psi \), it holds that \(\Vert \mathbf{F}({\tilde{\mathbf{z}}})\Vert \eqsim \Vert F(\tilde{z})\Vert _{\mathscr {H}'}\). From (II)–(III), we have \(F(\tilde{z})=DF(z)(\tilde{z}z)+o(\Vert \tilde{z}z\Vert _\mathscr {H})\). The proof is completed by \(\Vert DF(z)(\tilde{z}z)\Vert _{\mathscr {H}'} \eqsim \Vert \tilde{z}z\Vert _\mathscr {H}\) by (IV). \(\square \)
This a posteriori error estimator guides an appropriate enlargement of the current set \(\Lambda \) using a bulk chasing strategy, so that the sequence of approximations converge with the best possible rate to \(\mathbf{z}\). To arrive at an implementable method, that is even of optimal computational complexity, both the Galerkin solution and its residual are allowed to be computed inexactly within sufficiently small relative tolerances.
Algorithm 3.2
In step (R), by means of a loop in which an absolute tolerance is decreased, the true residual \(\mathbf{F}(\mathbf{z}_{\Lambda _i})\) is approximated within a relative tolerance \(\delta \). In step (B), bulk chasing is performed on the approximate residual. The idea is to find a smallest admissible \(\Lambda _{i+1} \supset \Lambda _i\) with \(\Vert \mathbf{r}_i_{\Lambda _{i+1}}\Vert \ge \mu _0 \Vert \mathbf{r}_i\Vert \). In order to be able to find an implementation that is of linear complexity, the condition of having a truly smallest \(\Lambda _{i+1}\) has been relaxed. Finally, in step (G), a sufficiently accurate approximation of the Galerkin solution w.r.t. the new set \(\Lambda _{i+1}\) is determined.
Convergence of the adaptive wavelet Galerkin method, with the best possible rate, is stated in the following theorem.
Theorem 3.3
Optimal computational complexity of the awgm—meaning that the work to obtain an approximation within a given tolerance \(\varepsilon >0\) can be bounded on some constant multiple of the bound on its support length from Theorem 3.3—is guaranteed under the following two conditions concerning the cost of the “bulk chasing” process, and that of the approximate residual evaluation, respectively. Indeed, apart from some obvious computations, these are the only two tasks that have to be performed in awgm.
Condition 3.4
The determination of \(\Lambda _{i+1}\) in Algorithm 3.2 is performed in \({\mathcal {O}}(\#\,\mathrm{supp}\,\mathbf{r}_i+\# \Lambda _i)\) operations.
In case of unconstrained approximation, i.e., any finite \(\Lambda \subset \vee \) is admissible, this condition is satisfied by collecting the largest entries in modulus of \(\mathbf{r}_i\), where, to avoid a suboptimal complexity, an exact sorting should be replaced by an approximate sorting based on binning. With tree approximation, the condition is satisfied by the application of the socalled Thresholding Second Algorithm from [2]. We refer to [31, §3.4] for a discussion.
To understand the second condition, that in the introduction was referred to as the cost Condition (1.1), note that inside the awgm it is never needed to approximate a residual more accurately than within a sufficiently small, but fixed relative tolerance.
Condition 3.5
Under both conditions, the awgm has optimal computational complexity:
Theorem 3.6
In the setting of Theorem 3.3, and under Conditions 3.4 and 3.5, not only \(\# \mathbf{z}_{\Lambda _i}\), but also the number of arithmetic operations required by awgm for the computation of \(\mathbf{z}_{\Lambda _i}\) is \({\mathcal {O}}(\Vert \mathbf{z}\mathbf{z}_{\Lambda _i}\Vert ^{1/s})\).
4 Application to normal equations
As discussed in Sect. 2, we will apply the awgm to the (nonlinear) normal equations \(DQ(u,\theta )=0\), with DQ from (2.6). That is, we apply the findings collected in the previous section for the general triple \((F,\mathscr {H},z)\) now reading as \((DQ,\mathscr {U}\times \mathscr {T},(u,\theta ))\).
Condition 3.5*
To verify this condition, we will use the additional property, i.e. on top of (1)–(4), that \(\Vert u\tilde{u}\Vert _\mathscr {U}+\Vert \theta \tilde{\theta }\Vert _\mathscr {T}\eqsim \Vert G_0(\tilde{u})G_1\tilde{\theta }\Vert _{\mathscr {V}'}+\Vert \tilde{\theta }G_2\tilde{u}\Vert _{\mathscr {T}}\), which is provided by Lemma 2.6.
5 Semilinear 2nd order elliptic PDE
We apply the solution method outlined in Sects. 2, 3 and 4 to the example of a semilinear 2nd order elliptic PDE with general (inhomogeneous) boundary conditions. The main task will be to verify Condition 3.5*.
5.1 Reformulation as a first order system least squares problem
Remarks 5.1
By writing \(g=u_0_{\Gamma _D}\) for some \(u_0 \in \mathscr {U}\), one infers that for linear N, existence of a (unique) solution u, i.e. (i), follows from \(L \in \mathcal {L}\mathrm {is}(\mathscr {V}_1,\mathscr {V}_1')\). For \(g=0\), the conditions of N being monotone and locally Lipschitz are sufficient for having a (unique) solution u. Relaxed conditions on N suffice to have a (locally unique) solution. We refer to [5].
Each of the terms \(A \nabla {\tilde{u}} \vec {\tilde{\theta }}\), \({\tilde{u}}g\), \(N({\tilde{u}})f{{\mathrm{div}}}\vec {\tilde{\theta }}\), and \(\vec {\tilde{\theta }}\cdot \mathbf{n}h\) correspond, in strong form, to a term of the least squares functional, and therefore their norms can be bounded by a multiple of the norm of the residual, which is the basis of our approximate residual evaluation. In order to verify Condition 3.5*, we have to collect some assumptions on the wavelets, which will be done in the next subsection.
Remark 5.2
If \(\Gamma _D=\emptyset \), then obviously (5.4) should be read without the second term involving \(\Psi ^{\mathscr {V}_2}\). If \(\Gamma _D\ne \emptyset \) and homogeneous Dirichlet boundary conditions are prescribed on \(\Gamma _D\), i.e., \(g=0\), it is simpler to select \(\mathscr {U}=\mathscr {V}_1=\{u \in H^1(\Omega ):u_{\Gamma _D}=0\}\), and to omit integral over \(\Gamma _D\) in the definition of G, so that again (5.4) should be read without the second term involving \(\Psi ^{\mathscr {V}_2}\).
5.2 Wavelet assumptions and definitions
We formulate conditions on \(\Psi ^{\mathscr {V}_1}\), \(\Psi ^{\mathscr {V}_2}\), \(\Psi ^{\mathscr {U}}\), and \(\Psi ^{\mathscr {T}}\), in addition to being Riesz bases for \(\mathscr {V}_1\), \(\mathscr {V}_2\), \(\mathscr {U}\), and \(\mathscr {T}\), respectively.
 (\(w_{1}\))

There exists a collection \(\mathcal {O}_\Omega :=\{\omega :\omega \in \mathcal {O}_\Omega \}\) of closed polytopes, such that, with \(\omega \in \mathbb {N}_0\) being the level of \(\omega \), \({{\mathrm{meas}}}(\omega \cap \omega ')=0\) when \(\omega =\omega '\) and \(\omega \ne \omega '\); for any \(\ell \in \mathbb {N}_0\), \(\bar{\Omega }=\cup _{\omega =\ell } \omega \); \(\mathrm{diam}\,\omega \eqsim 2^{\omega }\); and \(\omega \) is the union of \(\omega '\) for some \(\omega '\) with \(\omega '=\omega +1\). We call \(\omega \) the parent of its children \(\omega '\). Moreover, we assume that the \(\omega \in \mathcal {O}_\Omega \) are uniformly shape regular, in the sense that they satisfy a uniform Lipschitz condition, and \({{\mathrm{meas}}}(F_\omega )\eqsim {{\mathrm{meas}}}(\omega )^{\frac{n1}{n}}\) for \(F_\omega \) being any facet of \(\omega \).
 (\(w_{2}\))

\(\mathrm{supp}\,\psi ^*_\lambda \) is contained in a connected union of a uniformly bounded number of \(\omega \)’s with \(\omega =\lambda \), and restricted to each of these \(\omega \)’s is \(\psi ^*_\lambda \) a polynomial of degree m.
 (\(w_{3}\))

Each \(\omega \) is intersected by the supports of a uniformly bounded number of \(\psi ^*_\lambda \)’s with \(\lambda =\omega \).
 (\(w_{4}\))

\(\int _\Omega \psi ^*_\lambda \,dx =0\), possibly with the exception of those \(\lambda \) with \(\mathrm{dist}(\mathrm{supp}\,\psi ^*_\lambda ,\Gamma _D) \lesssim 2^{\lambda }\), or with \(\lambda =0\).
Definition 5.3
A collection \({\mathcal {T}} \subset \mathcal {O}_\Omega \) such that \(\overline{\Omega }=\cup _{\omega \in {\mathcal {T}}} \omega \), and for \(\omega _1 \ne \omega _2 \in {\mathcal {T}}\), \({{\mathrm{meas}}}(\omega _1 \cap \omega _2)=0\) will be called a tiling. With \(\mathcal {P}_m(\mathcal {T})\), we denote the space of piecewise polynomials of degree m w.r.t. \(\mathcal {T}\). The smallest common refinement of tilings \({\mathcal {T}}_1\) and \({\mathcal {T}}_2\) is denoted as \({\mathcal {T}}_1 \oplus {\mathcal {T}}_2\).
To be able to find, in linear complexity, a representation of a function, given as linear combination of wavelets, as a piecewise polynomial w.r.t. a tiling—mandatory for an efficient evaluation of nonlinear terms, we will impose a tree constraint on the underlying set of wavelet indices. A similar approach was followed earlier in [6, 10, 21, 33, 35].
Definition 5.4
To each \(\lambda \in \vee _*\) with \(\lambda >0\), we associate one \(\mu \in \vee _*\) with \(\mu =\lambda 1\) and \({{\mathrm{meas}}}(\mathrm{supp}\,\psi ^*_\lambda \cap \mathrm{supp}\,\psi ^*_\mu )>0\). We call \(\mu \) the parent of \(\lambda \), and so \(\lambda \) a child of \(\mu \).
To each \(\lambda \in \vee _*\), we associate some neighbourhood \(\mathcal {S}(\psi ^*_\lambda )\) of \(\mathrm{supp}\,\psi ^*_\lambda \), with diameter \(\lesssim 2^{\lambda }\), such that \(\mathcal {S}(\psi ^*_\lambda ) \subset \mathcal {S}(\psi ^*_\mu )\) when \(\lambda \) is a child of \(\mu \).
We call a finite \(\Lambda \subset \vee _*\) a tree, if it contains all \(\lambda \in \vee _*\) with \(\lambda =0\), as well as the parent of any \(\lambda \in \Lambda \) with \(\lambda >0\).
Note that we now have tree structures on the set \(\mathcal {O}_\Omega \) of polytopes, and as well as on the wavelet index sets \(\vee _*\). We trust that no confusion will arise when we speak about parents or children.
For some collections of wavelets, as the Haar or more generally, Alpert wavelets [1], it suffices to take \(\mathcal {S}(\psi ^*_\lambda ):=\mathrm{supp}\,\psi ^*_\lambda \). The next result shows that, thanks to (\(w_{1}\))(\(w_{2}\)), a suitable neighbourhood \(\mathcal {S}(\psi ^*_\lambda )\) as meant in Definition 5.4 always exists.
Lemma 5.5
With \(C:=\sup _{\lambda \in \vee _*} 2^{\lambda } {{\mathrm{diam}}}\, \mathrm{supp}\,\psi ^*_\lambda \), a valid choice of \(\mathcal {S}(\psi ^*_\lambda )\) is given by \(\{x \in \Omega :{{\mathrm{dist}}}(x,\mathrm{supp}\,\psi _\lambda ^*) \le C 2^{\lambda }\}\).
Proof
For \(\mu ,\lambda \in \vee _*\) with \(\mu =\lambda 1\) and \({{\mathrm{meas}}}(\mathrm{supp}\,\psi ^*_\lambda \cap \mathrm{supp}\,\psi ^*_\mu )>0\), and \(x \in \Omega \) with \({{\mathrm{dist}}}(x,\mathrm{supp}\,\psi _\lambda ^*) \le C 2^{\lambda }\), it holds that \({{\mathrm{dist}}}(x,\mathrm{supp}\,\psi _\mu ^*) \le {{\mathrm{dist}}}(x,\mathrm{supp}\,\psi _\lambda ^*)+{{\mathrm{diam}}}(\mathrm{supp}\,\psi ^*_\lambda ) \le C2^{\mu }\). \(\square \)
A proof of the following proposition, as well as an algorithm to apply the multitosinglescale transformation that is mentioned, is given in [31, §4.3].
Proposition 5.6
Given a tree \(\Lambda \subset \vee _*\), there exists a tiling \({\mathcal {T}}({\Lambda }) \subset \mathcal {O}_\Omega \) with \(\# {\mathcal {T}}({\Lambda }) \lesssim \# \Lambda \) such that \({{\mathrm{span}}}\{ \psi ^*_\lambda :\lambda \in \Lambda \}\subset \mathcal {P}_{m}({\mathcal {T}}({\Lambda }))\). Moreover, equipping \(\mathcal {P}_{m}(\mathcal {T}(\Lambda ))\) with a basis of functions, each of which supported in \(\omega \) for one \(\omega \in \mathcal {T}(\Lambda )\), the representation of this embedding, known as the multi to singlescale transform, can be applied in \({\mathcal {O}}(\#\Lambda )\) operations.
The benefit of the definition of \(\mathcal {S}(\psi _\lambda ^*)\) appears from the following lemma.
Lemma 5.7
Proof
The set \(\Lambda ^*\) contains all \(\lambda \in \vee _*\) with \(\lambda =0\), as well as, by definition of \(\mathcal {S}(\cdot )\), the parent of any \(\lambda \in \Lambda ^*\) with \(\lambda >0\). \(\square \)
In Proposition 5.6 we saw that for each tree \(\Lambda \) there exists a tiling \({\mathcal {T}}(\Lambda )\), with \(\# {\mathcal {T}}(\Lambda ) \lesssim \#\Lambda \), such that \({{\mathrm{span}}}\{ \psi ^*_\lambda :\lambda \in \Lambda \}\subset \mathcal {P}_{m}({\mathcal {T}}({\Lambda }))\). Conversely, in the following, given a tiling \({\mathcal {T}}\), and a constant \(k \in \mathbb {N}_0\), we construct a tree \(\Lambda ^*({{\mathcal {T}},k})\) with \(\# \Lambda ^*({{\mathcal {T}},k}) \lesssim \# {\mathcal {T}}\) (dependent on k) such that a kind of reversed statements hold: In “Appendix A”, statements of type \(\lim _{k \rightarrow \infty } \sup _{0 \ne g \in \mathcal {P}_{m}({\mathcal {T}})} \frac{\Vert \langle \Psi ^*,g\rangle _{L_2(\Omega )}_{\vee _*{\setminus } \Lambda ^*({{\mathcal {T}},k})}\Vert }{\Vert g\Vert _{*'}} = 0\) will be shown, meaning that for any tolerance there exist a k such that for any \(g \in \mathcal {P}_{m}({\mathcal {T}})\) the relative error in dual norm in the best approximation from the span of the corresponding dual wavelets with indices in \(\Lambda ^*({{\mathcal {T}},k})\) is less than this tolerance.
Definition 5.8
Proposition 5.9
The set \(\Lambda ^*({{\mathcal {T}},k})\) is a tree, and \(\# \Lambda ^*({{\mathcal {T}},k}) \lesssim \# {\mathcal {T}}\) (dependent on \(k \in \mathbb {N}_0\)).
Proof
The first statement follows from Lemma 5.7. Since the number of children of any \(\omega \in \mathcal {O}_\Omega \) is uniformly bounded, it holds that \(\# t({\mathcal {T}})\lesssim \# {\mathcal {T}}\), and so \(\# \Lambda ^*({{\mathcal {T}},k}) \lesssim \# {\mathcal {T}}\) as a consequence of the wavelets being locally supported. \(\square \)
Example 5.10
Let \(\Psi =\{\psi _\lambda :\lambda \in \vee \}\) be the collection of Haar wavelets on \(\Omega =(0,1)\), i.e., the union of the function \(\psi _{0,0} \equiv 1\), and, for \(\ell \in \mathbb {N}\) and \(k=0,\ldots ,2^{\ell 1}1\), the functions \(\psi _{\ell ,k}:=2^{\frac{\ell 1}{2}}\psi (2^{\ell 1}\cdot k)\), where \(\psi \equiv 1\) on \([0,\frac{1}{2}]\) and \(\psi \equiv 1\) on \((\frac{1}{2},1]\). Writing \(\lambda =(\ell ,k)\), we set \(\lambda =\ell \). The parent of \(\lambda \) with \(\lambda >0\) is \(\mu \) with \(\mu =\lambda 1\) and \(\mathrm{supp}\,\psi _\lambda \subset \mathrm{supp}\,\psi _\mu \), and \(\mathcal {S}(\psi _\lambda ):=\mathrm{supp}\,\psi _\lambda \).
Let \(\mathcal {O}_{\Omega }\) be the union, for \(\ell \in \mathbb {N}_0\) and \(k=0,\ldots ,2^\ell 1\), of the intervals \(2^{\ell }[k,k+1]\) to which we assign the level \(\ell \).
The (minimal) tiling \({\mathcal {T}}(\Lambda )\) as defined in Proposition 5.6 is given by \(\{[0,\frac{1}{8}],[\frac{1}{8},\frac{1}{4}], [\frac{1}{4},\frac{1}{2}], [\frac{1}{2},1]\}\).
Conversely, taking \({\mathcal {T}}:={\mathcal {T}}(\Lambda )\), the set \(\Lambda (\mathcal {T},1) \subset \vee \) as defined in Definition 5.8 is given by \(\{(0,0),(1,0),(2,0),(2,1),(3,0),(3,1), (4,0), (4,1)\}\) and is illustrated in the right picture in Fig. 1.
Definition 5.11
Remark 5.12
Let \(\Psi ^{\mathscr {U}}\) be a wavelet basis for \(\mathscr {U}\) of order \(d_\mathscr {U}>1\) (i.e., all wavelets \(\psi _\lambda ^{\mathscr {U}}\) up to level \(\ell \) span all piecewise polynomials in \(\mathscr {U}\) of degree \(d_\mathscr {U}1\) w.r.t. \(\{\omega :\omega \in \mathcal {O}_\Omega ,\,\omega =\ell \}\)), and similarly, for \(1 \le q \le n\), let \(\Psi ^{\mathscr {T}_q}\) be a wavelet basis for \(\mathscr {T}_q\) of order \(d_\mathscr {T}>0\). Recalling the definition of an approximation class given in (3.1), a sufficiently smooth solution \((u,\vec {\theta })\) is in \(\mathcal {A}^s\) for \(s=s_{\max }:=\min (\frac{d_\mathscr {U}1}{n},\frac{d_\mathscr {T}}{n})\), whereas on the other hand membership of \(\mathcal {A}^s\) for \(s>s_{\max }\) cannot be expected under whatever smoothness condition.
For \(s \le s_{\max }\), a sufficient and ‘nearly’ necessary condition for \((u,\vec {\theta })\in \mathcal {A}^s\) is that \((u,\vec {\theta }) \in B^{s n+1}_{p,\tau }(\Omega ) \times B^{s n}_{p,\tau }(\Omega )^n\) for \(\frac{1}{p}<s+\frac{1}{2}\) and arbitrary \(\tau >0\), see [11]. This mild smoothness condition in the ‘Besov scale’ has to be compared to the condition \((u,\vec {\theta }) \in H^{s n+1}(\Omega ) \times H^{s n}(\Omega )^n\) that is necessary to obtain a rate s with approximation from the spaces of type \({{\mathrm{span}}}\{\psi ^\mathscr {U}_\lambda :\lambda \le L\} \times \prod _{q=1}^n {{\mathrm{span}}}\{\psi ^{\mathscr {T}_q}_\lambda :\lambda \le L\}\).
Remark 5.13
5.3 An appropriate approximate residual evaluation
 (s1)Find a tiling \({\mathcal {T}}(\varepsilon ) \subset \mathcal {O}_\Omega \), such thatSet \({\mathcal {T}}(\Lambda ,\varepsilon ):={\mathcal {T}}(\Lambda ) \oplus {\mathcal {T}}(\varepsilon )\).$$\begin{aligned}&\inf _{(g_\varepsilon ,f_\varepsilon ,\vec {h}_\varepsilon ) \in \mathcal {P}_m({\mathcal {T}}(\varepsilon )\cap \Gamma _D) \cap C(\Gamma _D) \times \mathcal {P}_m({\mathcal {T}}(\varepsilon )) \times \mathcal {P}_m({\mathcal {T}}(\varepsilon ))^n} \left( \Vert gg_\varepsilon \Vert _{H^{\frac{1}{2}}(\Gamma _D)} \right. \\&\left. \quad +\,\Vert v_1 \mapsto \int _\Omega (ff_\varepsilon ) v_1\,dx+\int _{\Gamma _N}(h\vec {h}_\varepsilon \cdot \mathbf{n}) v_1 \,ds\Vert _{\mathscr {V}_1'}\right) \le \varepsilon . \end{aligned}$$
 (s2)
 (a)Approximateby$$\begin{aligned} \mathbf{r}^{(\frac{1}{2})}_1:=\langle \Psi ^{\mathscr {V}_1}, N(\tilde{u})f {{\mathrm{div}}}\vec {\tilde{\theta }}\rangle _{L_2(\Omega )}+\langle \Psi ^{\mathscr {V}_1}, \vec {\tilde{\theta }}\cdot \mathbf{n}h\rangle _{L_2(\Gamma _N)} \end{aligned}$$$$\begin{aligned} {\tilde{\mathbf{r}}}^{\left( \frac{1}{2}\right) }_1:=\mathbf{r}^{\left( \frac{1}{2}\right) }_1_{\Lambda ^{\mathscr {V}_1}({\mathcal {T}}(\Lambda ,\varepsilon ),k)}. \end{aligned}$$
 (b)With \(\tilde{r}_1^{(\frac{1}{2})}:=({\tilde{\mathbf{r}}}^{(\frac{1}{2})}_1)^\top \Psi ^{\mathscr {V}_1}\), approximateby \({\tilde{\mathbf{r}}}_1= \left[ \begin{array}{@{}l@{}} {\tilde{\mathbf{r}}}_{11} \\ {\tilde{\mathbf{r}}}_{12}\end{array} \right] := \mathbf{r}_1_{\Lambda ({\mathcal {T}}(\Lambda ^{\mathscr {V}_1}({\mathcal {T}}(\Lambda ,\varepsilon ),k)),k)}\).$$\begin{aligned} \mathbf{r}_1=\left[ \begin{array}{@{}l@{}} \mathbf{r}_{11} \\ \mathbf{r}_{12}\end{array} \right] :=\left[ \begin{array}{@{}l@{}} \langle DN(\tilde{u}) \Psi ^{\mathscr {U}}, \tilde{r}_1^{(\frac{1}{2})} \rangle _{L_2(\Omega )}\\ \langle \Psi ^{\mathscr {T}},\nabla \tilde{r}_1^{(\frac{1}{2})} \rangle _{L_2(\Omega )^n} \end{array} \right] \end{aligned}$$
 (a)
 (s3)Approximateby \({\tilde{\mathbf{r}}}_2:=\mathbf{r}_2_{\Lambda ({\mathcal {T}}(\Lambda ),k)}\).$$\begin{aligned} \mathbf{r}_2:=\left\langle \left[ \begin{array}{@{}l@{}} \,A \nabla \Psi ^{\mathscr {U}}\\ \Psi ^{\mathscr {T}} \end{array} \right] ,\vec {\tilde{\theta }}A \nabla \tilde{u} \right\rangle _{L_2(\Omega )^n} \end{aligned}$$
 (s4)
 (a)Approximate \(\mathbf{r}^{(\frac{1}{2})}_3:=\langle \Psi ^{\mathscr {V}_2}, \tilde{u}g\rangle _{L_2(\Gamma _D)}\) by$$\begin{aligned} {\tilde{\mathbf{r}}}^{\left( \frac{1}{2}\right) }_3:=\mathbf{r}^{\left( \frac{1}{2}\right) }_3_{\Lambda ^{\mathscr {V}_2}({\mathcal {T}}(\Lambda ,\varepsilon )\cap \Gamma _D,k)}. \end{aligned}$$
 (b)With \(\tilde{r}_3^{\left( \frac{1}{2}\right) }:=({\tilde{\mathbf{r}}}^{(\frac{1}{2})}_3)^\top \Psi ^{\mathscr {V}_2}\), approximate \(\mathbf{r}_3:=\left[ \begin{array}{@{}l@{}} \langle \Psi ^\mathscr {U},\tilde{r}_3^{(\frac{1}{2})}\rangle _{L_2(\Gamma _D)}\\ 0_{\vee _\mathscr {T}} \end{array}\right] \) by$$\begin{aligned} {\tilde{\mathbf{r}}}_3:=\mathbf{r}_3_{\Lambda ({\mathcal {T}}_{\Gamma _D}(\Lambda ^{\mathscr {V}_2}({\mathcal {T}}(\Lambda ,\varepsilon )\cap \Gamma _D,k)),k)}. \end{aligned}$$
 (a)
In the next theorem it is shown that this approximate residual evaluation satisfies the condition for optimality of the adaptive wavelet Galerkin method.
Theorem 5.14
Loosely speaking, this result can be rephrased by saying that if the solution of \(D\mathbf{Q}([\mathbf{u}^\top ,\varvec{\theta }^\top ]^\top )=0\) is in \(\mathcal {A}^s\), then so is the forcing function (f, g, h). This is not automatically true, cf. [12] for a discussion in the adaptive finite element context, but in the current setting it is a consequence of the fact that, thanks to assumption (5.3), the first order partial differential operators apply to the wavelet bases \(\Psi ^*\) for \(*\in \{\mathscr {U},\mathscr {T}_1,\ldots ,\mathscr {T}_n,\mathscr {V}_1,\mathscr {V}_2\}\) in ‘mild’ sense (the result of the application of each of these operators lands in \(L_2\)space).
Knowing that a suitable \(\mathcal {T}(\varepsilon )\) exists is different from knowing how to construct it. For our convenience thinking of \(g=h=0\), and so \(\mathscr {U}=\mathscr {V}_1=H^1_0(\Omega )\), assuming that \(f \in L_2(\Omega )\) one has \(\inf _{f_\varepsilon \in \mathcal {P}_m(\mathcal {T})} \Vert ff_\varepsilon \Vert ^2_{H^{1}(\Omega )} \lesssim \mathrm {osc}(f,\mathcal {T})^2:=\sum _{\omega \in \mathcal {T}} {{\mathrm{diam}}}(\omega )^2 \inf _{f_\omega \in \mathcal {P}_m(\omega )} \Vert ff_\omega \Vert ^2_{L_2(\omega )}\). Ignoring quadrature issues, for any partition \(\mathcal {T}\), \(\mathrm {osc}(f,\mathcal {T})\) is computable. A quasiminimal partition \(\mathcal {T}(\varepsilon )\) such that \(\mathrm {osc}(f,\mathcal {T}(\varepsilon )) \lesssim \varepsilon \) can be computed using the Thresholding Second Algorithm from [2]. Now the assumption to be added to Theorem 5.14 is that for such a partition, \(\# \mathcal {T}(\varepsilon ) \lesssim \varepsilon ^{1/s}\).
Note that it is nowhere needed to explicitly approximate the forcing functions by approximating their wavelet expansions.
Proof of Theorem 5.14
To bound the cost of the computations, we consider the computation of \({\tilde{\mathbf{r}}}^{(\frac{1}{2})}_1\). First, find a representation of \(N(\tilde{u}){{\mathrm{div}}}\vec {\tilde{\theta }}\) as an element of \(\mathcal {P}_m({\mathcal {T}}(\Lambda ,\varepsilon ))\) by applying multi to singlescale transforms. For each tile \(\omega \in {\mathcal {T}}(\Lambda ^{\mathscr {V}_1}({\mathcal {T}}(\Lambda ,\varepsilon ),k))\), and for \(\phi \) running over some basis of \(\mathcal {P}_m(\omega )\), compute \(\langle \phi , N(\tilde{u})f{{\mathrm{div}}}\vec {\tilde{\theta }}\rangle _{L_2(\omega )}\). From this, compute \([\langle \psi _\lambda ^{\mathscr {V}_1}, N(\tilde{u})f{{\mathrm{div}}}\vec {\tilde{\theta }}\rangle _{L_2(\Omega )}]_{\lambda \in \Lambda ^{\mathscr {V}_1}({\mathcal {T}}(\Lambda ,\varepsilon ),k)}\) by applying a transpose of a multi to singlescale transform. Similar steps yield \([\langle \psi _\lambda ^{\mathscr {V}_1}, \vec {\tilde{\theta }}\cdot \mathbf{n}h\rangle _{L_2(\Gamma _N)}]_{\lambda \in \Lambda ^{\mathscr {V}_1}({\mathcal {T}}(\Lambda ,\varepsilon ),k)}\). The total cost involved in computing \({\tilde{\mathbf{r}}}^{(\frac{1}{2})}_1\) is bounded by a multiple of \(\# {\mathcal {T}}(\Lambda ,\varepsilon ) \lesssim \#\Lambda +\varepsilon ^{1/s}\) operations.
Since fully analogous considerations apply to bounding the cost of the computations of \({\tilde{\mathbf{r}}}_1\), \({\tilde{\mathbf{r}}}_2\), \({\tilde{\mathbf{r}}}^{(\frac{1}{2})}_3\), and \({\tilde{\mathbf{r}}}_3\), the proof is completed. \(\square \)
6 Numerical results
The continuous piecewise quadratic wavelets are biorthogonal ones with the ‘dual multiresolution analysis’ being the sequence of continuous piecewise linears, zero at the \(\partial \Omega \), w.r.t. one additional level of refinement. Details of this basis construction will be reported elsewhere.
We performed the approximate evaluation of \(D\mathbf{Q}(\cdot )\) according to (s1)–(s4) and Theorem 5.14 in Sect. 5.3 with some simplifications because of the current homogeneous Dirichlet boundary conditions and sufficiently smooth righthand side [(s1) and (s4) are void, and in (s2) the boundary term is void]. Taking the parameter \(k=1\), it turns out that the approximate evaluation is sufficiently accurate to be used in Step (R) of awgm (so we do not perform a loop), as well in the simple fixed point iteration (3.2) with damping \(\omega =\frac{1}{4}\) that we use for Step (G). We took the parameter \(\gamma \) in Step (G) equal to 0.15 (more precisely, for stopping the iteration we checked whether the norm of the approximate residual, restricted to \(\Lambda _{i+1}\), is less or equal to \(0.15 \Vert \mathbf{r}_i\Vert \)).
For the bulk chasing, i.e. Step (B), we simply collected the indices of the largest entries of the approximate residual \(\mathbf{r}_i\) until the norm of the residual restricted to those indices is not less than \(0.4 \Vert \mathbf{r}_i\Vert \) (i.e. \(\mu _1=0.4\)), and then, after adding the indices from the current \(\varvec{\Lambda }_i\) to this set, we expand it to an admissible set (cf. Definition 5.11). Although this simple procedure is neither guaranteed to satisfy Condition 3.4 nor (B) for some constant \(0<\mu _0\le \mu _1\), we observed that it works satisfactory in practice.
7 Stationary Navier–Stokes equations
It is known that \(G:\mathscr {U}\rightarrow \mathscr {U}'\), and that a solution \((\vec {u},p)\) exists (see e.g. [24, Ch. IV]). Furthermore, G is two times differentiable with its second derivative being constant. We will assume that \(DG(\vec {u},p) \in \mathcal {L}(\mathscr {U},\mathscr {U}')\) is a homeomorphism with its range, so that each of the conditions (i)–(iii) from Sect. 2 are satisfied. The latter is known to hold true, with its range being equal to \(\mathscr {U}'\), when \(\vec {f}\) is sufficiently small, in which case the solution \((\vec {u},p)\) is also unique (e.g. see [24, Ch. IV]). For the linear case, so without the term \(\nu ^{3/2} (\vec {u}\cdot \nabla )\vec {u}\), thanks to our rescaling, \(DG(\vec {u},p)=G \in \mathcal {L}\mathrm {is}(\mathscr {U},\mathscr {U}')\), and is independent of \(\nu \).
Using the framework outlined in Sect. 2, we write this second order elliptic PDE as a first order system least squares problem. There are different possibilities to do so.
7.1 Velocity–pressure–velocity gradient formulation
 (s1)Find a tiling \({\mathcal {T}}(\varepsilon ) \subset \mathcal {O}_\Omega \), such thatIf \( [\mathbf{{u}}^\top ,\mathbf{{p}}^\top ,\varvec{{\theta }}^\top ]^\top \in \mathcal {A}^s\), then such a tiling exists with \(\# {\mathcal {T}}(\varepsilon ) \lesssim \varepsilon ^{1/s}\). Set \({\mathcal {T}}(\Lambda ,\varepsilon ):={\mathcal {T}}(\Lambda ) \oplus {\mathcal {T}}(\varepsilon )\).$$\begin{aligned} \inf _{\vec {f}_\varepsilon \in \mathcal {P}_m({\mathcal {T}}(\varepsilon ))^n,\,g_\varepsilon \in \mathcal {P}_m({\mathcal {T}}(\varepsilon ))/\mathbb {R}} \Vert \vec {f}\vec {f}_\varepsilon \Vert _{H^{1}(\Omega )^n} +\Vert gg_\varepsilon \Vert _{L_2(\Omega )} \le \varepsilon . \end{aligned}$$
 (s2)
 (a)
Approximate \(\mathbf{r}^{(\frac{1}{2})}_1:=\langle \Psi ^{(\hat{H}^1_0)^n},\nu ^{3/2} (\vec {\tilde{u}}\cdot \nabla ) \vec {\tilde{u}}\vec {f}{{\mathrm{div}}}\underline{\tilde{\theta }}+\nabla \tilde{p} \rangle _{L_2(\Omega )^n}\) by \({\tilde{\mathbf{r}}}^{(\frac{1}{2})}_1:=\mathbf{r}^{(\frac{1}{2})}_1_{\Lambda ^{(\hat{H}^1_0)^n}({\mathcal {T}}(\Lambda ,\varepsilon ),k)}\).
 (b)With \(\tilde{r}_1^{(\frac{1}{2})}:=({\tilde{\mathbf{r}}}^{(\frac{1}{2})}_1)^\top \Psi ^{(H^1_0)^n}\), approximateby \({\tilde{\mathbf{r}}}_1:=\mathbf{r}_1_{\Lambda ({\mathcal {T}}(\Lambda ^{(H^1_0)^n}({\mathcal {T}}(\Lambda ,\varepsilon ),k)),k)}\).$$\begin{aligned} \mathbf{r}_1=\left[ \begin{array}{@{}l@{}} \mathbf{r}_{11} \\ \mathbf{r}_{12}\\ \mathbf{r}_{13}\end{array} \right] := \left[ \begin{array}{@{}l@{}} \left\langle \frac{(\vec {\tilde{u}}\cdot \nabla ) \Psi ^{(H^1_0)^n}+(\Psi ^{(H^1_0)^n}\cdot \nabla ) \vec {\tilde{u}}}{\nu ^{3/2}},\tilde{r}_1^{(\frac{1}{2})}\right\rangle _{L_2(\Omega )^n}\\ \,\left\langle \Psi ^{L_2/\mathbb {R}},{{\mathrm{div}}}\tilde{r}_1^{(\frac{1}{2})} \right\rangle _{L_2(\Omega )} \\ \,\left\langle {{\mathrm{div}}}\, \Psi ^{L_2^{n^2}}, \tilde{r}_1^{(\frac{1}{2})} \right\rangle _{L_2(\Omega )^n} \end{array} \right] \end{aligned}$$
 (a)
 (s3)Approximate$$\begin{aligned} \mathbf{r}_2=\left[ \begin{array}{@{}l@{}} \mathbf{r}_{21} \\ \mathbf{r}_{22}\\ \mathbf{r}_{23}\end{array} \right] :=\left[ \begin{array}{@{}l@{}} \langle {{\mathrm{div}}}\, \Psi ^{(H^1_0)^n}, {{\mathrm{div}}}\vec {\tilde{u}}g \rangle _{L_2(\Omega )}\\ 0_{\vee _{L_2/\mathbb {R}}}\\ 0_{\vee _{L_2^{n^2}}} \end{array} \right] \text { by }{\tilde{\mathbf{r}}}_2:=\mathbf{r}_2_{\Lambda ({\mathcal {T}}(\Lambda ,\varepsilon ),k)} \end{aligned}$$
 (s4)Approximate$$\begin{aligned} \mathbf{r}_3=\left[ \begin{array}{@{}l@{}} \mathbf{r}_{31} \\ \mathbf{r}_{32}\\ \mathbf{r}_{33}\end{array} \right] := \left[ \begin{array}{@{}l@{}} \langle \nabla \Psi ^{(H^1_0)^n}, \nabla \vec {\tilde{u}}\underline{\tilde{\theta }} \rangle _{L_2(\Omega )^{n^2}}\\ 0_{\vee _{L_2/\mathbb {R}}}\\ \langle \Psi ^{L_2^{n^2}}, \underline{\tilde{\theta }}  \nabla \vec {\tilde{u}}\rangle _{L_2(\Omega )^{n^2}} \end{array}\right] \text { by } {\tilde{\mathbf{r}}}_3:=\mathbf{r}_3_{\Lambda ({\mathcal {T}}(\Lambda ,\varepsilon ),k)}. \end{aligned}$$
Theorem 7.1
We conclude that the awgm is an optimal solver for the stationary Navier–Stokes equations in the form \(D\mathbf{Q}([\mathbf{{u}}^\top ,\mathbf{{p}}^\top ,\varvec{{\theta }}^\top ]^\top )=0\) resulting from the velocity–pressure–velocity gradient formulation. Obviously, we cannot claim or even expect that this holds true uniformly in a vanishing viscosity parameter \(\nu \). This because in the limit already wellposedness of \(DG(\vec {u},p)\) cannot be expected.
7.2 Velocity–pressure–vorticity formulation
Since a vector field in the current space \(\mathscr {T}\) has \(2n3\) components, instead of \(n^2\) as in the previous subsection, the first order system formulation studied in this subsection is more attractive. As we will see, later in its derivation it will be needed that \(g=0\), i.e., \({{\mathrm{div}}}\vec {u}=0\).
The design of an approximate residual evaluation follows analogous steps as in the previous subsection. Equipping Cartesian products with bases of canonical form, and assuming that the scalarvalued bases \(\Psi ^*\) for \(*\in \{\hat{H}^1_0, H^1_0,L_2/\mathbb {R},L_2\}\) satisfy (\(w_{1}\))–(\(w_{4}\)), and that \([{\tilde{\mathbf{u}}}^\top ,{\tilde{\mathbf{p}}}^\top ,\varvec{\tilde{\omega }}^\top ]^\top \) is supported on an admissible set, four steps fully analogous to (s1)–(s4) in the previous subsection define an approximation scheme that satisfies Condition 3.5*. We conclude that the awgm is an optimal solver for the stationary Navier–Stokes equations in the form \(D\mathbf{Q}([\mathbf{{u}}^\top ,\mathbf{{p}}^\top ,\varvec{{\theta }}^\top ]^\top )=0\) resulting from the velocity–pressure–vorticity formulation. Again, also here we cannot claim or even expect that this holds true uniformly in a vanishing viscosity parameter \(\nu \).
8 Conclusion
We have seen that a wellposed (system of) 2nd order PDE(s) can always be formulated as a wellposed 1st order least squares system. The arising dual norm(s) can be replaced by the equivalent \(\ell _2\)norm(s) of the wavelet coefficients of the functional. The resulting Euler–Lagrange equations, also known as the (nonlinear) normal equations, can be solved at the best possible rate by the adaptive wavelet Galerkin method. We developed a new approximate residual evaluation scheme that also for semilinear problems satisfies the condition for optimal computational complexity, and that is quantitatively much more efficient than the usual apply scheme. Moreover, regardless of the order of the wavelets, it applies already to wavelet bases that have only one vanishing moment. As applications we discussed optimal solvers for first order least squares reformulations of 2nd order elliptic PDEs with inhomogeneous boundary conditions, and that of the stationary Navier–Stokes equations. In a forthcoming work, we will apply this approach to timeevolution problems.
Footnotes
 1.
Indeed, for an admissible \({\bar{\mathbf{u}}}\) with \(\Vert \mathbf{u}{\bar{\mathbf{u}}}\Vert \le \varepsilon \) and \(\#\,\mathrm{supp}\,{\bar{\mathbf{u}}} \lesssim \varepsilon ^{1/s}\), take \(f_\varepsilon =\bar{u}''\) and use \(\Vert f+\bar{u}''\Vert _{H^{1}(\Omega )} \eqsim \Vert u\bar{u}\Vert _{H^1(0,1)}\eqsim \Vert \mathbf{u}{\bar{\mathbf{u}}}\Vert \).
 2.
For general nonaffine \(\vec {H}\), \(\vec {H}^\mathrm{h}\) should be read as the Fréchet derivative \(D\vec {H}(u,\theta )\).
 3.
Actually, in the current setting its analysis is more straightforward, because the residuals are measured in \(L_2(0,1)\) instead of in \(H^{1}(0,1)\).
References
 1.Alpert, B.K.: A class of bases in \({L}^2\) for the sparse representation of integral operators. SIAM J. Math. Anal. 24, 246–262 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
 2.Binev, P., DeVore, R.: Fast computation in adaptive tree approximation. Numer. Math. 97(2), 193–217 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
 3.Bochev, P.B., Gunzburger, M.D.: LeastSquares Finite Element Methods, Volume 166 of Applied Mathematical Sciences. Springer, New York (2009)Google Scholar
 4.Bramble, J.H., Lazarov, R.D., Pasciak, J.E.: A leastsquares approach based on a discrete minus one inner product for first order systems. Math. Comput. 66(219), 935–955 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
 5.Badiale, M., Serra, E.: Semilinear Elliptic Equations for Beginners: Existence Results via the Variational Approach, Universitext. Springer, London (2011)CrossRefzbMATHGoogle Scholar
 6.Bittner, K., Urban, K.: Adaptive wavelet methods using semiorthogonal spline wavelets: sparse evaluation of nonlinear functions. Appl. Comput. Harmon. Anal. 24(1), 94–119 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
 7.Cohen, A., Dahmen, W., DeVore, R.: Adaptive wavelet methods for elliptic operator equations–convergence rates. Math. Comput. 70, 27–75 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
 8.Cohen, A., Dahmen, W., DeVore, R.: Adaptive wavelet methods II—beyond the elliptic case. Found. Comput. Math. 2(3), 203–245 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
 9.Cohen, A., Dahmen, W., DeVore, R.: Adaptive wavelet schemes for nonlinear variational problems. SIAM J. Numer. Anal. 41, 1785–1823 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
 10.Cohen, A., Dahmen, W., DeVore, R.: Sparse evaluation of compositions of functions using multiscale expansions. SIAM J. Math. Anal. 35(2), 279–303 (2003). (electronic) MathSciNetCrossRefzbMATHGoogle Scholar
 11.Cohen, A., Dahmen, W., Daubechies, I., DeVore, R.: Tree approximation and optimal encoding. Appl. Comput. Harmon. Anal. 11(2), 192–226 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
 12.Cohen, A., DeVore, R., Nochetto, R.H.: Convergence rates of AFEM with \(H^{1}\) data. Found. Comput. Math. 12(5), 671–718 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
 13.Cai, Z., Manteuffel, T.A., McCormick, S.F.: Firstorder system least squares for velocityvorticitypressure form of the Stokes equations, with application to linear elasticity. Electron. Trans. Numer. Anal 3(Dec.), 150–159 (1995). (electronic) MathSciNetzbMATHGoogle Scholar
 14.Cai, Z., Manteuffel, T.A., McCormick, S.F.: Firstorder system least squares for secondorder partial differential equations. II. SIAM J. Numer. Anal. 34(2), 425–454 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
 15.Cai, Z., Manteuffel, T.A., McCormick, S.F.: Firstorder system least squares for the Stokes equations, with application to linear elasticity. SIAM J. Numer. Anal. 34(5), 1727–1741 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
 16.Carstensen, C., Park, E.J.: Convergence and optimality of adaptive least squares finite element methods. SIAM J. Numer. Anal. 53(1), 43–62 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 17.Chegini, N.G., Stevenson, R.P.: An adaptive wavelet method for semilinear first order system least squares. Comput. Math. Appl. (2015). https://doi.org/10.1515/cmam20150023
 18.Dahmen, W., Harbrecht, H., Schneider, R.: Adaptive methods for boundary integral equations—complexity and convergence estimates. Math. Comput. 76, 1243–1274 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
 19.Dahmen, W., Kunoth, A., Schneider, R.: Wavelet least squares methods for boundary value problems. SIAM J. Numer. Anal. 39(6), 1985–2013 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
 20.Dahmen, W., Stevenson, R.P.: Elementbyelement construction of wavelets satisfying stability and moment conditions. SIAM J. Numer. Anal. 37(1), 319–352 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
 21.Dahmen, W., Schneider, R., Xu, Y.: Nonlinear functionals of wavelet expansions–adaptive reconstruction and fast evaluation. Numer. Math. 86(1), 49–101 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
 22.Gantumur, T.: An optimal adaptive wavelet method for nonsymmetric and indefinite elliptic problems. J. Comput. Appl. Math. 211(1), 90–102 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
 23.Gantumur, T., Harbrecht, H., Stevenson, R.P.: An optimal adaptive wavelet method without coarsening of the iterands. Math. Comput. 76, 615–629 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
 24.Girault, V., Raviart, P.A.: An analysis of a mixed finite element method for the Navier–Stokes equations. Numer. Math. 33, 235–271 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
 25.Nguyen, H., Stevenson, R.P.: Finite element wavelets with improved quantitative properties. J. Comput. Appl. Math. 230(2), 706–727 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
 26.Pousin, J., Rappaz, J.: Consistency, stability, a priori and a posteriori errors for Petrov–Galerkin methods applied to nonlinear problems. Numer. Math. 69(2), 213–231 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
 27.Stevenson, R.P.: Stable threepoint wavelet bases on general meshes. Numer. Math. 80, 131–158 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
 28.Stevenson, R.P.: On the compressibility of operators in wavelet coordinates. SIAM J. Math. Anal. 35(5), 1110–1132 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
 29.Stevenson, R.P.: Adaptive wavelet methods for solving operator equations: an overview. In: DeVore, R.A., Kunoth, A. (eds.) Multiscale, Nonlinear and Adaptive Approximation: Dedicated to Wolfgang Dahmen on the Occasion of his 60th Birthday, pp. 543–598. Springer, Berlin (2009)CrossRefGoogle Scholar
 30.Stevenson, R.P.: First order system least squares with inhomogeneous boundary conditions. IMA. J. Numer. Anal. 34(2013), 863–878 (2013)MathSciNetzbMATHGoogle Scholar
 31.Stevenson, R.P.: Adaptive wavelet methods for linear and nonlinear leastsquares problems. Found. Comput. Math. 14(2), 237–283 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 32.Urban, K.: Wavelet Methods for Elliptic Partial Differential Equations. Oxford University Press, Oxford (2009)zbMATHGoogle Scholar
 33.Vorloeper, J.: Adaptive Wavelet Methoden für Operator Gleichungen, Quantitative Analyse und Softwarekonzepte. Ph.D. thesis, RTWH Aachen. VDI Verlag GmbH, Düsseldorf, ISBN 9783183427208 (2009)Google Scholar
 34.Xu, Y., Zou, Q.: Adaptive wavelet methods for elliptic operator equations with nonlinear terms. Adv. Comput. Math. 19(1–3), 99–146 (2003). Challenges in computational mathematics (Pohang, 2001)MathSciNetCrossRefzbMATHGoogle Scholar
 35.Xu, Y., Zou, Q.: Tree wavelet approximations with applications. Sci. China Ser. A 48(5), 680–702 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.