UvA-DARE (Digital Academic Repository) An optimal adaptive wavelet method for first order system least squares

In this paper, it is shown that any well-posed 2nd order PDE can be reformulated as a well-posed ﬁrst order least squares system. This system will be solved by an adaptive wavelet solver in optimal computational complexity. The applications that are considered are second order elliptic PDEs with general inhomogeneous boundary conditions, and the stationary Navier–Stokes equations


General rights
It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).
Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons.In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website.Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands.You will be contacted as soon as possible.

Introduction
In this paper, a wavelet method is constructed for the optimal adaptive solution of stationary PDEs.We develop a general procedure to write any well-posed 2nd order PDE as a well-posed first order least squares system.The (natural) least squares formulations contain dual norms, that, however, impose no difficulties for a wavelet solver.The advantages of the first order least squares system formulation are twofold.
Firstly, regardless of the original problem, the least squares problem is symmetric and positive definite, which opens the possibility to develop an optimal adaptive solver.The obvious use of the least-squares functional as an a posteriori error estimator, however, is not known to yield a convergent method (see, however, [16] for an alternative for Poisson's problem).As we will see, the use of the (approximate) residual in wavelet coordinates as an a posteriori error estimator does give rise to an optimal adaptive solver.
Secondly, as we will discuss in more detail in the following subsections, the optimal application of a wavelet solver to a first order system reformulation allows for a simpler, and quantitatively more efficient approximate residual evaluation than with the standard formulation of second order.Moreover, it applies equally well to semi-linear equations, as e.g. the stationary Navier-Stokes equations, and it applies to wavelets that have only one vanishing moment.
The approach to apply the wavelet solver to a well-posed first order least squares system reformulation also applies to time-dependent PDEs in simultaneous space-time variational formulations, as parabolic problems or instationary Navier-Stokes equations.With those problems, the wavelet bases consist of tensor products of temporal and spatial wavelets.Consequently, they require a different procedure for the approximate evaluation of the residual in wavelet coordinates, which will be the topic of a forthcoming work.

Adaptive wavelet schemes, and the approximate residual evaluation
Adaptive wavelet schemes can solve well-posed linear and nonlinear operator equations at the best possible rate allowed by the basis in linear complexity [7][8][9]29,31,32,34].Schemes with those properties will be called optimal.The schemes can be applied to PDEs, which we study in this work, as well as to integral equations [18].
There are two kinds of adaptive wavelet schemes.One approach is to apply some convergent iterative method to the infinite system in wavelet coordinates, with decreasing tolerances for the inexact evaluations of residuals [8,9].These schemes rely on the application of coarsening to achieve optimal rates.
The other approach is to solve a sequence of Galerkin approximations from spans of nested sets of wavelets.The (approximate) residual in wavelet coordinates of the current approximation is used as an a posteriori error estimator to make an optimal selection of the wavelets to be added to form the next set [7].With this scheme, that is studied in the current work, the application of coarsening can be avoided [23,34], and it turns out to be quantitatively more efficient.This approach is restricted to PDOs whose Fréchet derivatives are symmetric and positive definite (compact perturbations can be added though, see [22]).
A key computational ingredient of both schemes is the approximate evaluation of the residual in wavelet coordinates.Let us discuss this for a linear operator equation Au = f , with, for some separable Hilbert spaces H and K , for convenience over R, f ∈ K and A ∈ Lis(H , K ) (i.e., A ∈ L(H , K ) and A −1 ∈ L(K , H )).
Equipping H and K with Riesz bases H , K , formally viewed as column vectors, Au = f can be equivalently written as a bi-infinite system of coupled scalar equations Au = f, where f = f ( K ) is the infinite 'load vector', A = (A H )( K ) is the infinite 'stiffness' or system matrix, and u = u H .
The space of square summable vectors of reals indexed over a countable index set ∨ will be denoted as 2 (∨) or simply as 2 .The norm on this space will be simply denoted as • .
As a consequence of A ∈ Lis(H , K ), we have that A ∈ Lis( 2 , 2 ).For the moment, let us additionally assume that A is symmetric and positive definite, as when K = H , (Au)(v) = (Av)(u) and (Au)(u) u 2 H (u, v ∈ H ). If this is not the case, then the following can be applied to the normal equations A Au = A f.
For the finitely supported approximations ũ to u that are generated inside the adaptive wavelet scheme, the residual r = f − A ũ has to be approximated within a sufficiently small relative tolerance.The resulting scheme has been shown to converge with the best possible rate: Whenever u can be approximated at rate s, i.e. u ∈ A s , meaning that for any N ∈ N there exists a vector of length N that approximates u within tolerance O(N −s ), the approximations produced by the scheme converge with this rate s.Moreover, the scheme has linear computational complexity under the cost condition that the approximate residual evaluation within an (absolute) tolerance ε r requires not more than O(ε −1/s + # supp ũ) operations. (1.1) The lower bound on ε reflects the fact that inside the adaptive scheme, it is never needed to approximate a residual more accurately than within a sufficiently small, but fixed relative tolerance.The validity of (1.1) will require additional properties of H and K in addition to being Riesz bases.For that reason we consider wavelet bases.
The standard way to approximate the residual within tolerance ε is to approximate both f and A ũ separately within tolerance ε/2.Under reasonable assumptions, f can be approximated within tolerance ε/2 by a vector of length O(ε −1/s ).
For the approximation of A ũ, it is used that, thanks to the properties of the wavelets as having vanishing moments, each column of A, although generally infinitely supported, can be well approximated by finitely supported vectors.In the approximate matrix-vector multiplication routine introduced in [7], known as the APPLY-routine, the accuracy with which a column is approximated is judiciously chosen depending on the size of the corresponding entry in the input vector ũ.It has been shown to realise a tolerance ε/2 at cost O(ε −1/s | ũ| 1/s A s + # supp ũ), for any s in some range (0, s * ]. For wavelets that have sufficiently many vanishing moments, this range was shown to include the range of s ∈ (0, s max ] for which, in view of the order of the wavelets, u ∈ A s can possibly be expected (cf.[28]).Using that for the approximations ũ to u that are generated inside the adaptive wavelet scheme, it holds that | ũ| A s |u| A s , in those cases the cost condition is satisfied, and so the adaptive wavelet scheme is optimal.
The APPLY-routine, however, is quite difficult to implement.Note, in particular, that its outcome depends nonlinearly on the input vector ũ.Furthermore, in experiments, the routine turns out to be quantitatively expensive.Finally, although it has been generalized to certain classes of semi-linear PDEs, in those cases it has not been shown that s * ≥ s max , meaning that for nonlinear problems the issue of optimality is actually open.

An alternative for the APPLY routine
A main goal of this paper is to develop a quantitatively efficient alternative for the APPLY-routine, that, moreover, gives rise to provable optimal adaptive wavelet schemes for classes of semi-linear PDEs, and applies to wavelets with only one vanishing moment.As an introduction, we consider the model problem of Poisson's equation in one space dimension − u = f on (0, 1), u = 0 at{0, 1}, that, in standard variational form, reads as finding u ∈ H := H1 0 (0, 1) such that where, by identifying L 2 (0, 1) with L 2 (0, 1) and using that H 1 0 (0, 1) → L 2 (0, 1) is dense, , L 2 (0,1) is also used to denote the duality pairing on H −1 (0, 1) × H 1 0 (0, 1).We consider piecewise polynomial, locally supported wavelet Riesz bases H and K for H 1 0 (0, 1).Let us exclusively consider admissible approximations ũ to u in the sense that their finite supports form trees, meaning that if λ ∈ supp ũ, then then there exists a μ ∈ supp ũ, whose level is one less than the level of λ, and meas(supp It is known that the approximation classes A s become only 'slightly' smaller by this restriction to tree approximation compared to unconstrained approximation (cf.[11]).What is more, the restriction to tree approximation seems mandatory anyway to construct an optimal algorithm for nonlinear operators.The benefit of tree approximation is that ũ := ũ H has an alternative, 'single-scale' representation as a piecewise polynomial w.r.t. a partition T 1 of (0, 1) with #T 1 #supp ũ.
For the moment, let us make the additional assumption that H is selected inside H 2 (0, 1).Then, for an admissible ũ, with its support denoted as H , integration-byparts shows that where ũ is piecewise polynomial w.r.t.T 1 .If u ∈ A s , then for any ε > 0 there exists a piecewise polynomial f ε w.r.t. a partition T 2 of (0,1) into 1 The term f − f ε is commonly referred to as data oscillation.
The function f ε + ũ is piecewise polynomial w.r.t. the smallest common refinement T of T 1 and T 2 .Thanks to this piecewise smoothness of f ε + ũ w.r.t.T , and the property of | is decreasing as function of the minimal difference of the level of ψ K λ and that of any subinterval in T that has non-empty intersection with supp ψ K λ .Here with the level of ψ K λ or that of an interval ω, we mean an ∈ N 0 such that 2 − diam(supp ψ K λ ) or 2 − diam ω, respectively.In particular, given a constant ς > 0, there exists a constant k, such that by dropping all λ for which the aforementioned minimal level difference exceeds k, the remaining indices form a tree K with # K #T ε −1/s + # H (dependent on k), and (see Proposition A.1) and so, using r Note that for ς being sufficiently small, and so k sufficiently large, by taking ε suitably the approximate residual will meet any accuracy that is required in the cost Condition (1.1).By selecting 'single scale' collections H and K with span H ⊇ span H | H and span K ⊇ span K | K , and # H # H and # K # K , this approximate residual r| K can be computed in O( K ) operations as follows: First express ũ in terms of H by applying a multi-to-single scale transformation to ũ, then apply to this representation the sparse stiffness matrix ( K ) , ( H ) L 2 (0,1) , subtract K , f L 2 (0,1) , and finally apply the transpose of the multi-to-single scale transformation involving K | K and K .This approximate residual evaluation thus satisfies the cost condition for optimality, it is relatively easy to implement, and it is observed to be quantitatively much more efficient.
It furthermore generalizes to semi-linear operators, in any case for nonlinear terms that are multivariate polynomials in u and derivatives of u.Indeed, as an example, suppose that instead of −u = f the equation reads as −u + u 3 = f .Then the residual is given by 1) .Since f ε + ũ − ũ3 is a piecewise polynomial w.r.t.T , the same arguments shows that K , f + ũ − ũ3 The essential idea behind our approximate residual evaluation is that, after the replacement of f by f ε , the different terms that constitute the residual are expressed in a common dictionary, before the residual, as a whole, is integrated against K .In our simple one-dimensional example this was possible by selecting H ⊂ H2 (0, 1), so that the operator could be applied to the wavelets in strong, or more precisely, mild sense, meaning that the result of the application lands in L 2 (0, 1).It required piecewise smooth, globally C 1 -wavelets.Although the same approach applies in more dimensions, there, except on product domains, the construction of C 1 -wavelet bases is cumbersome.For that reason, our approach will be to write a PDE of second order as a system of PDEs of first order.It will turn out that there are several possibilities to do so.

A common first order system least squares formulation
To introduce ideas, let us again consider the model problem of Poisson's equation in one dimension.By introducing the additional unknown θ = u , for given f ∈ L 2 (0, 1) this PDE can be written as the first order system of finding (u, θ) ∈ H 1 0 (0, 1)×H 1 (0, 1) such that

The corresponding homogeneous operator
To arrive at a symmetric and positive definite system, we consider the least squares problem of solving argmin 1) .
Its solution solves the Euler-Lagrange equations which in this setting are known as the normal equations.
To these equations we apply the adaptive wavelet scheme, so with 'H '='K '= 1) .From H h being a homeomorphism with its range, i.e., being a consequence of H h being even boundedly invertible between the full spaces, it follows that the bilinear form is bounded, symmetric, and coercive.After equipping H 1 0 (0, 1) and H 1 (0, 1) with wavelet Riesz bases H 1 0 and H 1 , for admissible ũ and θ , with ũ := ũ H 1 0 and θ := θ H 1 the residual reads as 1) . (1. 2) The construction of an approximate residual follows the same lines as described before for the standard variational formulation. 3The functions ũ , θ, θ are piecewise polynomials w.r.t. a partition T 1 of (0, 1) into O(# supp ũ + # supp θ ) subintervals.If (u, θ ) ∈ A s , then there exists a piecewise polynomial f ε w.r.t. a partition T 2 of (0, 1) into O(ε −1/s ) subintervals such that f − f ε L 2 (0,1) ≤ ε.Thanks to the piecewise smoothness of ũ − θ and θ + f ε , there exist trees H 1 0 and H 1 , with # Since the approximate residual can be evaluated in O(# H 1 0 ∪ H 1 ) operations, we conclude that it satisfies the cost Condition (1.1) for optimality of the adaptive wavelet scheme.
Remark 1.2 Recall that, as always with least squares formulations, the same results are valid when lower order, possibly non-symmetric terms are added to the second order PDE, as long as the standard variational formulation remains well-posed.Furthermore, as we will discuss, least squares formulations allows to handle inhomogeneous boundary conditions.Finally, as we will see, the approach of reformulating a 2nd order PDE as a first order least squares problem, and then optimally solving the normal equations applies to any well-posed PDE, not necessarily being elliptic.
In [17] we applied the adaptive wavelet scheme to a least squares formulation of the current, common type.Disadvantages of this formulation are that (i) it requires that f ∈ L 2 (0, 1), instead of f ∈ H −1 (0, 1) as allowed in the standard variational formulation.Related to that, and more importantly, for a semi-linear equation −u + N (u) = f , (ii) it is needed that N maps H 1 0 (0, 1) into L 2 (0, 1), instead of into H −1 (0, 1).Finally, with the generalization of this least squares formulation to more than one space dimensions, (iii) the space H 1 (0, 1) for θ reads as H (div; ).In [17], for two-dimensional connected polygonal domains , we managed to construct a wavelet Riesz basis for H (div; ).This construction, however, relied on the fact that, in two dimensions, any divergence-free function is the curl of an H 1 -function.To the best of our knowledge, wavelet Riesz bases for H (div; ) for non-product domains in three and more dimensions have not been constructed.
In the next subsection, we describe a prototype of a least-squares formulation with which these disadvantages (i)-(iii) are avoided.
(1.5)This last step is essential because it allows us, after the replacement of f by a piecewise polynomial f ε , to express θ + f ε in a common dictionary.The additional condition is satisfied by piecewise polynomial, globally continuous wavelets, which are available on general domains in multiple dimensions.
In view of the previous discussions, to describe the approximate residual evaluation, it suffices to consider the term L 2 , z ( The by now familiar approach is applied twice: the function θ is piecewise polynomial w.r.t. a partition T 1 into O(# supp θ) subintervals.If (u, θ ) ∈ A s , then there exists a piecewise polynomial f ε w.r.t. a partition T 2 of (0,1) into O(ε −1/s ) subintervals such that f − f ε H −1 (0,1) ≤ ε.Consequently, there exists a tree Combining this with the approximations for the other two terms that constitute the residual, we infer that there exist trees H 1 0 and L 2 with # where r2 is constructed from r 2 by replacing z by z| Ĥ 1 0 .This approximate residual evaluation satisfies the cost Condition (1.1) for optimality.
As we will see in Sect.2, the advantage of the current construction of a first order system least squares problem is that it applies to any well-posed (semi-linear) second order PDE.The two instances of the spaces H 1 0 (0, 1) represent the trial and test spaces in the standard variational formulation, and well-posedness of the latter implies wellposedness of the least squares formulation.The additional space L 2 (0, 1) reads in general as an L 2 -type space.So, in particular, with this formulation, H (div)-spaces do not enter.The price to be paid is that (1.5) is somewhat more complicated than (1.2), and that therefore its approximation is somewhat more costly to compute.
Remark 1.3 The more popular 'dual' mixed formulation of our model problem reads as finding (u, θ) ).The resulting least squares formulation has the combined disadvantages of both other formulations that 123 we considered.It requires f ∈ L 2 (0, 1), possibly nonlinear terms should map into L 2 (0, 1), in more than one dimension the space H 1 (0, 1) reads as an H (div)-space, and one of the norms involved in the least squares minimalisation is a dual norm.
Remark 1.4 With the aim to avoid both a dual norm in the least squares minimalisation, and H (div) or other vectorial Sobolev spaces as trial spaces, in our first investigations of this least squares approach in [31], we considered the 'extended div-grad' first order system least squares formulation studied in [14].A sufficient and necessary [30], but restrictive condition for its well-posedness is H 2 -regularity of the homogeneous boundary value problem.

Layout of the paper
In Sect.2, a general procedure is given to reformulate any well-posed semi-linear 2nd order PDE as a well-posed first order least squares problem.As we will see, this procedure gives an effortless derivation of well-posed first order least squares formulations of elliptic 2nd order PDEs, and that of the stationary Navier-Stokes equations.The arising dual norm can be replaced by the equivalent 2 -norm of a functional in wavelet coordinates.
In Sect.3, we recall properties of the adaptive wavelet Galerkin method (awgm).Operator equations of the form F(z) = 0, where, for some Hilbert space H , F : H → H and D F(z) is symmetric and positive definite, are solved by the awgm at the best possible rate from a Riesz basis for H . Furthermore, under a condition on the cost of the approximate residual evaluations, the method has optimal computational complexity.
In the short Sect.4, it is shown that the awgm applies to the normal equations that result from the first order least squares problems as derived in Sect. 2.
In Sect.5, we apply the awgm to a first order least squares formulation of a semilinear 2nd order elliptic PDE with general inhomogeneous boundary conditions.Under a mild condition on the wavelet basis for the trial space, the efficient approximate residual evaluation that was outlined in Sect.1.4 applies, and it satisfies the cost condition, so that the awgm is optimal.Wavelet bases that satisfy the assumptions are available on general polygonal domains.Some technical results needed for this section are given in "Appendix A".
In Sect.6 the findings from Sect. 5 are illustrated by numerical results.In Sect.7, we consider the so-called velocity-pressure-velocity gradient and the velocity-pressure-vorticity first order system formulations of the stationary Navier-Stokes equations.Results analogously to those demonstrated for the elliptic problem will be shown to be valid here as well.
order system least squares problems.A particular instance of this approach has been discussed in Sect.1.4.For some separable Hilbert spaces U and V , for convenience over R, consider a differentiable mapping 1 In applications G is the operator associated to a variational formulation of a PDO with trial space U and test space V .
For T being another separable Hilbert space, let i.e., G 1 and G 2 are bounded linear operators.
Remark 2.2 In applications, as those discussed in Sects.5 and 7, G 1 G 2 will be a factorization of the leading second order part of the PDO (possibly modulo terms that vanish at the solution, cf.Sect.7.2) into a product of first order PDOs.
Obviously, u solves G(u) = 0 if and only if it is the first component of the solution (u, θ) of The following lemma shows that well-posedness of the original formulation implies that of the reformulation as a system. (2.4) For ( f, g) being in the set at the right-hand side in (2.4), consider the system This system has a unique solution, so that the ⊆-symbol in (2.4) reads as an equality sign, and In the following, we will always assume that ) is a homeomorphism with its range, the latter by Lemma 2.3.In summary, when the equation and solving one equation is equivalent to solving the other.
Remark 2.5 Actually, one might dispute whether these equations should be called well-posed when ran DG(u) V and so ran D H (u, θ) V × T .In any case, under conditions (i)-(iii), and so (a)-(c), the corresponding least-squares problems and resulting (nonlinear) normal equations are well-posed, as we will see next.
A solution (u, θ) of H (u, θ) = 0 is a minimizer of the least squares functional In particular, it holds that Lemma 2.6 For H : U × T ⊃ dom( H ) → V × T , and H being Fréchet differentiable at a root (u, θ), property (c) is equivalent to the property that for ( ũ, θ) ∈ U ×V in a neighbourhood of (u, θ), that, in this setting, are usually called (nonlinear) normal equations.Using (a)-(b), one computes that We conclude the following: Under the assumptions (a)-(c), it holds that (1) D Q is a mapping from a subset of a separable Hilbert space, viz.U × T , to its dual; (2) there exists a solution of As a consequence of the last property, one infers that in a neighborhood of (u, θ), D Q(u, θ) = 0 has exactly one solution.
In view of the above findings, in order to solve G(u) = 0, for a G that satisfies (i)-(iii), we are going to solve the (nonlinear) normal equations D Q(u, θ) = 0.A major advantage of D Q over G is that its derivative is symmetric and coercive.
A concern, however, is whether, for given (u, θ ), (v, η) ∈ U × T , D Q(u, θ)(v, η) as given by (2.5) is evaluable.We will think of the inner product on T as being evaluable.In our applications, T will be of the form L 2 ( ) N .To deal with the dual norm on V , we equip V with a Riesz basis meaning that the analysis operator and so its adjoint, known as the synthesis operator, In the definition of the least squares functional Q, and consequently in that of D Q, we now replace the standard dual norm on V by the equivalent norm Then in view of the definition of H and the expression for D H , we obtain that where ( 1)-( 4) are still valid.
Remark 2.7 We refer to [4] for an alternative approach to solve least square problems that involves dual norms.
To solve the obtained (nonlinear) normal equations D Q(u, θ) = 0 we are going to apply the adaptive wavelet Galerkin method (awgm).Note that the definition of D Q(u, θ)(v, η) still involves an infinite sum over ∨ V that later, inside the solution process, is going to be replaced by a finite one.

The adaptive wavelet Galerkin method (awgm)
In this section, we summarize findings about the awgm from [17,31].
Writing z = F z, and with an equivalent formulation of F(z) = 0 is given by We are going to approximate z, and so z, by a sequence of Galerkin approximations from the spans of increasingly larger sets of wavelets, which sets are created by an adaptive process.Given ⊂ ∨, the Galerkin approximation z , or equivalently, z := z , are the solutions of , respectively.These solutions exist uniquely when inf z ∈ 2 ( ) z − z is sufficiently small [26,31].
In order to be able to construct efficient algorithms, in particular when F is nonaffine, it will be needed to consider only sets from a certain subset of all finite subsets of ∨.In our applications, this collection of so-called admissible will consist of (Cartesian products of) finite trees.For the moment, it suffices when the collection of admissible sets is such that the union of any two admissible sets is again admissible.
To provide a benchmark to evaluate our adaptive algorithm, for s > 0, we define the nonlinear approximation class A vector z is in A s if and only if there exists a sequence of admissible This means that z can be approximated in 2 (∨) at rate s by vectors supported on admissible sets, or, equivalently, z can be approximated in H at rate s from spaces of type span{ψ λ : λ ∈ , is admissible}.
The adaptive wavelet Galerkin method (awgm) defined below produces a sequence of increasingly more accurate Galerkin approximations z to z. The, generally, infinite residual F(z ) is used as an a posteriori error estimator.A motivation for the latter is given by the following result.
This a posteriori error estimator guides an appropriate enlargement of the current set using a bulk chasing strategy, so that the sequence of approximations converge with the best possible rate to z.To arrive at an implementable method, that is even of optimal computational complexity, both the Galerkin solution and its residual are allowed to be computed inexactly within sufficiently small relative tolerances.

Algorithm 3.2 (awgm)
endfor In step (R), by means of a loop in which an absolute tolerance is decreased, the true residual F(z i ) is approximated within a relative tolerance δ.In step (B), bulk chasing is performed on the approximate residual.The idea is to find a smallest admissible i+1 ⊃ i with r i | i+1 ≥ μ 0 r i .In order to be able to find an implementation that is of linear complexity, the condition of having a truly smallest i+1 has been relaxed.Finally, in step (G), a sufficiently accurate approximation of the Galerkin solution w.r.t. the new set i+1 is determined.
Convergence of the adaptive wavelet Galerkin method, with the best possible rate, is stated in the following theorem.
and the neighborhood Z of the solution z all be sufficiently small.Then, for some α = α[μ 0 ] < 1, the sequence (z i ) i produced by awgm satisfies The computation of the approximate Galerkin solution z i+1 can be implemented by performing the simple fixed point iteration Taking ω > 0 to be a sufficiently small constant and starting with z (0) i+1 = z i , a fixed number of iterations suffices to meet the condition F(z This holds also true when each of the F()| i+1 evaluations is performed within an absolute tolerance that is a sufficiently small fixed multiple of r i .
Optimal computational complexity of the awgm-meaning that the work to obtain an approximation within a given tolerance ε > 0 can be bounded on some constant multiple of the bound on its support length from Theorem 3.3-is guaranteed under the following two conditions concerning the cost of the "bulk chasing" process, and that 123 of the approximate residual evaluation, respectively.Indeed, apart from some obvious computations, these are the only two tasks that have to be performed in awgm.

Condition 3.4 The determination of
In case of unconstrained approximation, i.e., any finite ⊂ ∨ is admissible, this condition is satisfied by collecting the largest entries in modulus of r i , where, to avoid a suboptimal complexity, an exact sorting should be replaced by an approximate sorting based on binning.With tree approximation, the condition is satisfied by the application of the so-called Thresholding Second Algorithm from [2].We refer to [31, §3.4] for a discussion.
To understand the second condition, that in the introduction was referred to as the cost Condition (1.1), note that inside the awgm it is never needed to approximate a residual more accurately than within a sufficiently small, but fixed relative tolerance.Condition 3.5 For a sufficiently small, fixed ς > 0, there exists a neighborhood Z of the solution z of F(z) = 0, such that for all admissible ⊂ ∨, z ∈ 2 ( ) ∩ Z, and any ε > 0, there exists an r ∈ 2 (∨) with Under both conditions, the awgm has optimal computational complexity: Theorem 3. 6 In the setting of Theorem 3.3, and under Conditions 3.4 and 3.5, not only #z i , but also the number of arithmetic operations required by awgm for the computation of z i is O( z − z i −1/s ).

Application to normal equations
As discussed in Sect.2, we will apply the awgm to the (nonlinear) normal equations D Q(u, θ) = 0, with D Q from (2.6).That is, we apply the findings collected in the previous section for the general triple (F, H , z) now reading as (D Q, U ×T , (u, θ)).

Semi-linear 2nd order elliptic PDE
We apply the solution method outlined in Sects.2, 3 and 4 to the example of a semilinear 2nd order elliptic PDE with general (inhomogeneous) boundary conditions.The main task will be to verify Condition 3.5*.

Let
⊂ R n be a bounded domain, N ∪ D = ∂ with meas( N ∩ D ) = 0, meas( D ) > 0 when D = ∅, and A : → R n×n symm with ξ A(•)ξ ξ 2 (ξ ∈ R n , a.e.).We set For that in standard variational form reads as finding u ∈ U such that 123 We assume that this variational problem has a solution u, and that G, i.e., N , is two times continuously Fréchet differentiable in a neighborhood of u, and DG(u) ∈ L(U , V ) is a homeomorphism with its range, i.e., we assume that it follows that condition (iii) is satisfied when (actually, L being a homeomorphism with its range is already sufficient).By writing g = u 0 | D for some u 0 ∈ U , one infers that for linear N , existence of a (unique) solution u, i.e. (i), follows from L ∈ Lis(V 1 , V 1 ).For g = 0, the conditions of N being monotone and locally Lipschitz are sufficient for having a (unique) solution u.Relaxed conditions on N suffice to have a (locally unique) solution.We refer to [5].
Using the framework outlined in Sect.2, we write this second order elliptic PDE as a first order system least squares problem.Putting T = L 2 ( ) n , we define The results from Sect. 2 show that the solution u can be found as the first component of the minimizer being the solution of the normal equations D Q(u, θ) = 0, and furthermore, that these normal equations are well-posed in the sense that they satisfy (1)-(4).
To deal with the 'unpractical' norm on V , as in Sect.1.4, at the end of Sect.2, and in Sect.4, we equip V 1 and V 2 with wavelet Riesz bases and replace, in the definition of Q, the norms on their duals by the equivalent norms defined by g( 123 Next, after equiping U and T with Riesz bases and so U ×T with = ( U , 0 T )∪(0 U , T ), we apply the awgm to the resulting system

one dictionary of functions on
and one dictionary of functions on N , similarly to Sect.1.4 we impose the additional, but in applications easily realisable condition that T ⊂ H (div; ). ( Then for finitely supported approximations where we used the vanishing traces of Each of the terms A∇ ũ − θ , ũ − g, N ( ũ) − f − div θ , and θ • n − h correspond, in strong form, to a term of the least squares functional, and therefore their norms can be bounded by a multiple of the norm of the residual, which is the basis of our approximate residual evaluation.In order to verify Condition 3.5*, we have to collect some assumptions on the wavelets, which will be done in the next subsection.
Remark 5.2 If D = ∅, then obviously (5.4) should be read without the second term involving V 2 .If D = ∅ and homogeneous Dirichlet boundary conditions are prescribed on D , i.e., g = 0, it is simpler to select 0}, and to omit integral over D in the definition of G, so that again (5.4) should be read without the second term involving V 2 .

Wavelet assumptions and definitions
We formulate conditions on V 1 , V 2 , U , and T , in addition to being Riesz bases for V 1 , V 2 , U , and T , respectively.
Recalling that where T q = {ψ T q λ : λ ∈ ∨ T q } is a Riesz basis for T q (with ∨ T q ∩ ∨ T q = ∅ when q = q ).
For * ∈ {U , T 1 , . . ., T n , V 1 }, we collect a number of (standard) assumptions, (w 1 )-(w 4 ), on the scalar-valued wavelet collections * = {ψ * λ : λ ∈ ∨ * } on .Corresponding assumptions on the wavelets V 2 on D will be formulated at the end of this subsection.To each λ ∈ ∨ * , we associate a value |λ| ∈ N 0 , which is called the level of λ.We will assume that the elements of * have one vanishing moment, and are locally supported, piecewise polynomial of some degree m, w.r.t.dyadically nested partitions in the following sense: Generally, the polynomial degree m will be different for the different bases, but otherwise fixed.The collection O is shared among all bases.Note that the conditions of U being a basis for U , and to consist of piecewise polynomials, implies that U ⊂ C( ¯ ).Wavelets of in principle arbitrary order that satisfy these assumptions can be found in e.g.[20,25].Definition 5.3 A collection T ⊂ O such that = ∪ ω∈T ω, and for ω 1 = ω 2 ∈ T , meas(ω 1 ∩ ω 2 ) = 0 will be called a tiling.With P m (T ), we denote the space of piecewise polynomials of degree m w.r.t.T .The smallest common refinement of tilings T 1 and T 2 is denoted as To be able to find, in linear complexity, a representation of a function, given as linear combination of wavelets, as a piecewise polynomial w.r.t. a tiling-mandatory for an efficient evaluation of nonlinear terms, we will impose a tree constraint on the underlying set of wavelet indices.A similar approach was followed earlier in [6,10,21,33,35].Definition 5. 4 To each λ ∈ ∨ * with |λ| > 0, we associate one μ ∈ ∨ * with |μ| = |λ| − 1 and meas(supp ψ * λ ∩ supp ψ * μ ) > 0. We call μ the parent of λ, and so λ a child of μ.
Note that we now have tree structures on the set O of polytopes, and as well as on the wavelet index sets ∨ * .We trust that no confusion will arise when we speak about parents or children.
For some collections of wavelets, as the Haar or more generally, Alpert wavelets [1], it suffices to take S(ψ * λ ) := supp ψ * λ .The next result shows that, thanks to (w 1 )-(w 2 ), a suitable neighbourhood S(ψ * λ ) as meant in Definition 5.4 always exists.A proof of the following proposition, as well as an algorithm to apply the multi-tosingle-scale transformation that is mentioned, is given in [31, §4.3].

Proposition 5.6 Given a tree
⊂ ∨ * , there exists a tiling T ( )

T ( )). Moreover, equipping P m (T ( )) with a basis of functions, each of which supported in ω for one ω ∈ T ( ), the representation of this embedding, known as the multi-to single-scale transform, can be applied in O(# ) operations.
The benefit of the definition of S(ψ * λ ) appears from the following lemma.
In Proposition 5.6 we saw that for each tree there exists a tiling T ( ), with #T ( ) # , such that span{ψ * λ : λ ∈ } ⊂ P m (T ( )).Conversely, in the following, given a tiling T , and a constant k ∈ N 0 , we construct a tree * (T , k) with # * (T , k) #T (dependent on k) such that a kind of reversed statements hold: In "Appendix A", statements of type lim k→∞ sup 0 =g∈P m (T ) * ,g L 2 ( ) | ∨ * \ * (T ,k) g * = 0 will be shown, meaning that for any tolerance there exist a k such that for any g ∈ P m (T ) the relative error in dual norm in the best approximation from the span of the corresponding dual wavelets with indices in * (T , k) is less than this tolerance.Let O be the union, for ∈ N 0 and k = 0, . . ., 2 − 1, of the intervals 2 [k, k + 1] to which we assign the level .Now as an example let ⊂ ∨ be the set {(0, 0), (1, 0), (2, 0), (3, 0)}, which is a tree in the sense of Definition 5.4.It corresponds to the solid parts in the left picture in Fig. 1.

Definition 5.11
Recalling from (4.1) and the first lines of this subsection that the Riesz basis for U × T is of canonical form Remark 5.12 Let U be a wavelet basis for U of order d U > 1 (i.e., all wavelets ψ U λ up to level span all piecewise polynomials in U of degree d U − 1 w.r.t.{ω : ω ∈ O , |ω| = }), and similarly, for 1 ≤ q ≤ n, let T q be a wavelet basis for T q of order d T > 0. Recalling the definition of an approximation class given in (3.1), a sufficiently smooth solution (u, θ) is in ), whereas on the other hand membership of A s for s > s max cannot be expected under whatever smoothness condition.
For s ≤ s max , a sufficient and 'nearly' necessary condition for (u, θ) ∈ A s is that (u, θ) ∈ B sn+1 p,τ ( ) × B sn p,τ ( ) n for 1 p < s + 1 2 and arbitrary τ > 0, see [11].This mild smoothness condition in the 'Besov scale' has to be compared to the condition (u, θ) ∈ H sn+1 ( ) × H sn ( ) n that is necessary to obtain a rate s with approximation from the spaces of type span{ψ U λ : |λ| ≤ L} × n q=1 span{ψ T q λ : |λ| ≤ L}.We pause to add three more assumptions on our PDE: We assume that w.r.t. the coarsest possible tiling {ω : N (u) is a partial differential operator of at most first order, with coefficients that are piecewise polynomials in u and x, and (5.6) N , and so D , is the union of facets of ω ∈ O with |ω| = 0. (5.7) Remark 5. 13 The subsequent analysis can easily be generalized to A being piecewise smooth.With some more efforts other nonlinear terms N can be handled as well.For example, for N (u) of the form n 1 (u)u, it will be needed that for some m ∈ N, for each ω ∈ O , there exists a subspace Finally in this subsection we formulate our assumptions on the wavelet Riesz basis . We assume that it satisfies the assumptions (w 2 ) and (w 3 ) with O reading as Furthermore, we impose that ).The definition of a boundary tiling T D ⊂ O D is similar to Definition 5.3.Also similar to the corresponding preceding definitions are that of a tree ⊂ ∨ V 2 , and of the boundary tiling T D ( ) ⊂ O D for ⊂ ∨ V 2 being a tree.Conversely, for a boundary tiling

An appropriate approximate residual evaluation
Given an admissible ⊂ ∨, [ ũ , θ ] ∈ 2 ( ) with ( ũ, θ) := [ ũ , θ ] sufficiently close to (u, θ), and an ε > 0, our approximate evaluation of DQ([ ũ , θ ] ), given in (5.4), is built in the following steps, where k ∈ N 0 is a sufficiently large constant: (s1) Find a tiling 123 (s3) Approximate Although (s2)-(s4) may look involved at first glance, the same kind of approximation is used at all instances.Each term in (5.4) consists essentially of a wavelet basis that is integrated against a piecewise polynomial, or more precisely, a function that can be sufficiently accurately approximated by a piecewise polynomial thanks to the control of the data oscillation by the refinement of the partition performed in (s1).In each of these terms, all wavelets are neglected whose levels exceed locally the level of the partition plus a constant k.
In the next theorem it is shown that this approximate residual evaluation satisfies the condition for optimality of the adaptive wavelet Galerkin method.
Loosely speaking, this result can be rephrased by saying that if the solution of DQ([u , θ ] ) = 0 is in A s , then so is the forcing function ( f, g, h).This is not automatically true, cf.[12] for a discussion in the adaptive finite element context, but in the current setting it is a consequence of the fact that, thanks to assumption (5.3), the first order partial differential operators apply to the wavelet bases * for * ∈ {U , T 1 , . . ., T n , V 1 , V 2 } in 'mild' sense (the result of the application of each of these operators lands in L 2 -space).
Knowing that a suitable T (ε) exists is different from knowing how to construct it.For our convenience thinking of g = h = 0, and so (ω) .Ignoring quadrature issues, for any partition T , osc( f, T ) is computable.A quasi-minimal partition T (ε) such that osc( f, T (ε)) ε can be computed using the Thresholding Second Algorithm from [2].Now the assumption to be added to Theorem 5.14 is that for such a partition, #T (ε) ε −1/s .Note that it is nowhere needed to explicitly approximate the forcing functions by approximating their wavelet expansions.
Proof of Theorem 5.14 By construction we have From U , T , and V 1 being Riesz bases, and being continuously differentiable at u, one infers that (5.9)We bound all terms at the right-hand side.
With f ε , h ε from (s1), using that V 1 is a Riesz basis, we have that (5.10) From shows that the norm at the right-hand side of (5.10 Again by using (s1), we infer that (5.11) Our assumptions on N show that DN ( ũ)w is of the form p 1 ( ũ)w + p 2 ( ũ) • ∇w for some piecewise polynomials p 1 and p 2 in ũ and x w.r.t.{ω : , and r (5.12) Thanks to θ − A∇ ũ ∈ P m (T ( )) n by (5.5), and A being piecewise polynomial, an application of Proposition A.4 shows that With g ε from (s1), using that V 2 is a Riesz basis for H − 1 2 ( D ), we have that shows that the norm at the right-hand side of (5.14 . Again by using (s1), we infer that . ( Thanks to , and r . (5.16) By collecting the upper bounds (5.11)-(5.16)derived for all five terms at the righthand side of (5.9), and by using Lemma 2.6 in combination with the least squares functional given in (5.2), the proof of the first statement is completed.
To bound the cost of the computations, we consider the computation of r( 12 ) 1 .First, find a representation of N ( ũ)−div θ as an element of P m (T ( , ε)) by applying multito single-scale transforms.For each tile ω ∈ T ( V 1 (T ( , ε), k)), and for φ running over some basis of 1 is bounded by a multiple of #T ( , ε) # + ε −1/s operations.
Since fully analogous considerations apply to bounding the cost of the computations of r1 , r2 , r( 12 ) 3 , and r3 , the proof is completed.

6 Numerical results
For ⊂ R2 being the L-shaped domain (0, 1) 2 \[ 1 2 , 1) 2 , we consider the semi-linear boundary value problem − u + u 3 = f on , u = 0 on ∂ , ( where, for simplicity, f = 1 (to test our code we also tried some right hand sides corresponding to some fabricated polynomial solutions u). 2 , we applied the awgm (Algorithm 3.2), with F reading as DQ, for the adaptive solution of [u , θ ] from Here we equipped T i (i = 1, 2) with the continuous piecewise linear three-point wavelet basis from [27], the space V with the same basis (obviously scaled differently, and with homogeneous boundary conditions incorporated), and U with a newly developed continuous piecewise quadratic wavelet basis.These bases can be applied on any polygon, and they satisfy all assumptions (w 1 )-(w 4 ).In particular all wavelets except possibly those 'near' the Dirichlet boundary have one vanishing moment.For each basis, to each wavelet that is not on the coarsest level we associate one parent on the next coarsest level according to Definition 5.4.For any ∈ N 0 the subsets of the bases consisting of all wavelets up to some level span exactly the space of continuous piecewise linears, continuous piecewise linears zero at ∂ , or continuous piecewise quadratics zero at ∂ , respectively, w.r.t. the subdivision of as indicated in Fig. 2. 2 , and , where * N is the subset of all wavelets from * up to some level, where N denotes its cardinality On a bounded domain, the three-point basis has actually not be proven to be stable in L 2 ( ).Although alternative bases are available whose Riesz basis property has been proven, we opted for the three-point basis, because of its efficient implementation and because numerical results indicated that it is stable.In Fig. 3, numerically computed condition numbers are given of sets of all wavelets up to some level.
The continuous piecewise quadratic wavelets are biorthogonal ones with the 'dual multiresolution analysis' being the sequence of continuous piecewise linears, zero at the ∂ , w.r.t.one additional level of refinement.Details of this basis construction will be reported elsewhere.
We performed the approximate evaluation of DQ(•) according to (s1)-(s4) and Theorem 5.14 in Sect.5.3 with some simplifications because of the current homogeneous Dirichlet boundary conditions and sufficiently smooth right-hand side [(s1) and (s4) are void, and in (s2) the boundary term is void].Taking the parameter k = 1, it turns out that the approximate evaluation is sufficiently accurate to be used in Step (R) of awgm (so we do not perform a loop), as well in the simple fixed point iteration (3.2) with damping ω = 1 4 that we use for Step (G).We took the parameter γ in Step (G) equal to 0.15 (more precisely, for stopping the iteration we checked whether the norm of the approximate residual, restricted to i+1 , is less or equal to 0.15 r i ).
For the bulk chasing, i.e.
Step (B), we simply collected the indices of the largest entries of the approximate residual r i until the norm of the residual restricted to those indices is not less than 0.4 r i (i.e.μ 1 = 0.4), and then, after adding the indices from the current i to this set, we expand it to an admissible set (cf. Definition 5.11).Although this simple procedure is neither guaranteed to satisfy Condition 3.4 nor (B) for some constant 0 < μ 0 ≤ μ 1 , we observed that it works satisfactory in practice.
In view of the orders 3 and 2 of the bases for U and T , and the fact the PDE is posed in n = 2 space dimensions, the best possible convergence rate that can be expected is min( 3−1 2 , 2−0 2 ) = 1.In Fig. 4, we show the norm of the approximate residual versus the total number of wavelets underlying the approximation for (u, θ).
The norm of the approximate residual is proportional to the U × T -norm of the error in the approximation for (u, θ).We conclude that it decays with the best possible rate.Moreover, we observed that the computing times scale linearly with the number of unknowns.Throughout the iteration, the number of wavelets for the approximation  5 The approximation for u from the span of 202 wavelets for u is of the same order as the number of wavelets for the approximation for θ .The maximum level that is reached at the end of the computations is 26 for u and 28 for θ.An approximate solution is illustrated in Fig. 5. Centers of the supports of the wavelets that were selected for the approximation for u are illustrated in Fig. 6.
Finally, in order to get an impression of the condition number of the bi-infinite linearized normal equations that eventually we are solving, we consider the Poisson equation, i.e. (6.1) without the u 3 term.We are interested in the spectral condition number of the 'system matrix' given by Fig. 6 Centers of the supports of the first 10,366 wavelets for the approximation for u that were selected by the awgm To that end we numerically approximated the condition numbers of the finite square blocks of rows and columns with indices in , with running over the wavelet index sets that were created by the awgm.Even such a finite block cannot be evaluated exactly, because it still involves the infinite collection V .Given a , we restricted this collection to the wavelets with indices in V (T ( ), k) as defined by Proposition 5.6 and Definition 5.8, where, as always, we take k = 1.The resulting matrix is exactly the one that we approximately invert in Step (G) by the fixed point iteration.The computed condition numbers are given in Fig. 7.We performed the same computation also for k = 2, so with an enlarged set of wavelet indices from the basis V , and found nearly indistinguishable results.We may conclude that for → ∞, the given numbers give accurate approximations for the condition number of the matrix in (6.2). 123 It is known that G : U → U , and that a solution ( u, p) exists (see e.g.[24, Ch.IV]).Furthermore, G is two times differentiable with its second derivative being constant.We will assume that DG( u, p) ∈ L(U , U ) is a homeomorphism with its range, so that each of the conditions (i)-(iii) from Sect. 2 are satisfied.The latter is known to hold true, with its range being equal to U , when f is sufficiently small, in which case the solution ( u, p) is also unique (e.g.see [24,Ch. IV]).For the linear case, so without the term ν −3/2 ( u • ∇) u, thanks to our re-scaling, DG( u, p) = G ∈ Lis(U , U ), and is independent of ν.
Using the framework outlined in Sect.2, we write this second order elliptic PDE as a first order system least squares problem.There are different possibilities to do so.

Velocity-pressure-velocity gradient formulation
With T := L 2 ( ) n 2 , we define The results from Sect. 2 show that the solution ( u, p) can be found as the first components of the minimizer ( u, p, θ) ∈ U × T of .1) and so as the solution of the normal equations D Q( u, p, θ) = 0.Here we have used that on H 1 0 ( ) n , div Following [3], we call θ = ∇ u the velocity gradient.As follows from Sect. 2, these normal equations are well-posed in the sense that they satisfy (1)-( 4).This gives us an alternative, effortless proof for [15,Thm. 3.1].
To deal with the 'unpractical' norm on H −1 ( ) n , we equip H 1 0 ( ) n with some wavelet Riesz basis and replace, in the definition of Q, the norm on its dual by the equivalent norm defined by h( ( Ĥ 1 0 ) n ) for h ∈ H −1 ( ) n .Next, after equipping * ∈ {H 1 0 ( ) n , L 2 ( )/R, L 2 ( ) n 2 } with a Riesz basis * = {ψ * λ : λ ∈ ∨ * }, and so , we apply the awgm to the resulting system To express the three terms in . one dictionary, similarly to Sect.1.4 we impose the additional, but in applications easily realizable conditions that Then for finitely supported approximations 123 Each of the terms div ũ − g, ∇ ũ − θ , ν −3/2 ( ũ • ∇) ũ − f − div θ + ∇ p correspond, in strong form, to a term of the least squares functional, and therefore their norms can be bounded by a multiple of the norm of the residual, which is the basis of our approximate residual evaluation.This approximate residual evaluation follows the same lines as with the elliptic problem from Sect. 5. Actually, things are easier here because we assume homogeneous boundary conditions.Selecting the Riesz bases for the Cartesian products H 1 0 ( ) n and L 2 ( ) n 2 of canonical form, we assume that all scalar-valued bases * for * ∈ { Ĥ 1 0 , H 1 0 , L 2 /R, L 2 } satisfy the assumptions that were made in Sect.5.2, in particular (w 1 )-(w 4 ).Let := supp[ ũ , p , θ ] be admissible, i.e., ∩ ∨ * are trees. ) The same arguments (actually a subset) that led to Theorem 5.14 show the following theorem.
We conclude that the awgm is an optimal solver for the stationary Navier-Stokes equations in the form DQ([u , p , θ ] ) = 0 resulting from the velocity-pressurevelocity gradient formulation.Obviously, we cannot claim or even expect that this holds true uniformly in a vanishing viscosity parameter ν.This because in the limit already well-posedness of DG( u, p) cannot be expected.

Velocity-pressure-vorticity formulation
Restricting to n ∈ {2, 3}, we set T := L 2 ( ) 2n−3 , and define where for n = 2, curl should be read as the scalar-valued operator (and so ω • curl u as ω curl u).The (formal) adjoint curl equals curl for n = 3, whereas Since a vector field in the current space T has 2n − 3 components, instead of n 2 as in the previous subsection, the first order system formulation studied in this subsection is more attractive.As we will see, later in its derivation it will be needed that g = 0, i.e., div u = 0.
Using that on , the results from Sect. 2 show that the solution ( u, p) can be found as the first component of the solution in U × T of the system and replace, in the definition of Q 2 , the norm on its dual by the equivalent norm g , we apply the awgm to the resulting system To express the three terms in w.r.t.one dictionary, we impose the easily realizable conditions that The design of an approximate residual evaluation follows analogous steps as in the previous subsection.Equipping Cartesian products with bases of canonical form, and assuming that the scalar-valued bases * for * ∈ { Ĥ 1 0 , H 1 0 , L 2 /R, L 2 } satisfy (w 1 )-(w 4 ), and that [ ũ , p , ω ] is supported on an admissible set, four steps fully analogous to (s1)-(s4) in the previous subsection define an approximation scheme that satisfies Condition 3.5*.We conclude that the awgm is an optimal solver for the stationary Navier-Stokes equations in the form DQ([u , p , θ ] ) = 0 resulting from the velocity-pressure-vorticity formulation.Again, also here we cannot claim or even expect that this holds true uniformly in a vanishing viscosity parameter ν.

Conclusion
We have seen that a well-posed (system of) 2nd order PDE(s) can always be formulated as a well-posed 1st order least squares system.The arising dual norm(s) can be replaced by the equivalent 2 -norm(s) of the wavelet coefficients of the functional.The resulting Euler-Lagrange equations, also known as the (nonlinear) normal equations, can be solved at the best possible rate by the adaptive wavelet Galerkin method.We developed a new approximate residual evaluation scheme that also for semi-linear problems satisfies the condition for optimal computational complexity, and that is quantitatively much more efficient than the usual apply scheme.Moreover, regardless of the order of the wavelets, it applies already to wavelet bases that have only one vanishing moment.As applications we discussed optimal solvers for first order least squares reformulations of 2nd order elliptic PDEs with inhomogeneous boundary conditions, and that of the stationary Navier-Stokes equations.In a forthcoming work, we will apply this approach to time-evolution problems.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix A. Decay estimates
We collect a number of decay estimates that have been used in the proof of Theorem 5.14.Recall the definition of the spaces U and V given at the beginning of Sect.5.1.
The following proposition and subsequent lemma have been used to bound r . The presence of the boundary integral and the fact that the upper bound that is given depends on the norm of g as a whole, and not on norms of g 1 and g 2 requires a non-standard treatment.where g 1 ∈ P m (T ), g 2 ∈ P m (T ) n .Then (uniform in T and g).
with the last inequality being valid when H 1 0 ( ) by assumption, and so g (v)   v H 1 ( ) For bounding the second term in the statement of the lemma, for a tile ω ∈ O we write ∂ω ∩ N as ∂ω N .Thanks to the uniform shape regularity condition, for any ω ∈ O with meas(∂ω N ) > 0, there exists a V ω ⊂ {v ∈ H 1 (ω) : v| ∂ω\∂ω N = 0} such that V ω ⊥ L 2 (ω) P m (ω).
For each ω ∈ T with meas(∂ω N ) > 0, select v ω ∈ V ω for which in the case that {w ∈ H 1 ( ) : Otherwise, when by assumption, and so g (v)   v H 1 ( ) An easy version of the proof of Proposition A.1 shows the following result, which has been used to bound r 11 − r11 in the proof of Theorem 5.14.
Proposition A. 3 For a tiling T ⊂ O , and g ∈ P m (T ), it holds that 2 −k g U (uniform in T and g).
The statements from the following proposition have been used to bound the terms r 11 − r11 (first statement) and r 2 − r2 (both statements) in the proof of Theorem 5.14.
Proposition A. 4 For a tiling T ⊂ O , g ∈ P m (T ), 1 ≤ q ≤ n, and g ∈ P m (T ) n , it holds that T q , g L 2 ( ) ∨ T q \ T q (T ,k) 2 −k/2 g L 2 ( ) (uniform in T , g, and g).
To prove the second inequality, for any λ ∈ ∨ U , similar to (A.4) we have When ψ U λ has a vanishing moment and supp ψ U λ ⊂ ω, we have replacing (A.6).From these two estimates, the second inequality follows similarly to the first one.

Fig. 4 1 Fig.
Fig. 4 Norm of the (approximate) residual, normalized by the norm of the initial residual, generated by the awgm versus the total number of wavelets.The dotted line indicates the best possible slope − 1

5
For a boundary tiling T D ⊂ O D , and g ∈ P m (T D ) ∩ C( D ), .8) which, for biorthogonal wavelets, is essentially is (w 4 ) (cf. [20, lines following (A.2)]).(To relax the smoothness conditions on D needed for the definition of H −1 ( D ), one can replace (5.8) by ψ V 2 λ H s ( D ) with a norm that is bounded uniformly in ũ from a neighbourhood of u.Similarly, from u → u| D ∈ L(U , H 1 2 ( D )), and U and V 2 being Riesz bases for U and H 1 2 ( D ) , respectively, we infer that