Exponential ReLU Neural Network Approximation Rates for Point and Edge Singularities

We prove exponential expressivity with stable ReLU Neural Networks (ReLU NNs) in $H^1(\Omega)$ for weighted analytic function classes in certain polytopal domains $\Omega$, in space dimension $d=2,3$. Functions in these classes are locally analytic on open subdomains $D\subset \Omega$, but may exhibit isolated point singularities in the interior of $\Omega$ or corner and edge singularities at the boundary $\partial \Omega$. The exponential expression rate bounds proved here imply uniform exponential expressivity by ReLU NNs of solution families for several elliptic boundary and eigenvalue problems with analytic data. The exponential approximation rates are shown to hold in space dimension $d = 2$ on Lipschitz polygons with straight sides, and in space dimension $d=3$ on Fichera-type polyhedral domains with plane faces. The constructive proofs indicate in particular that NN depth and size increase poly-logarithmically with respect to the target NN approximation accuracy $\varepsilon>0$ in $H^1(\Omega)$. The results cover in particular solution sets of linear, second order elliptic PDEs with analytic data and certain nonlinear elliptic eigenvalue problems with analytic nonlinearities and singular, weighted analytic potentials as arise in electron structure models. In the latter case, the functions correspond to electron densities that exhibit isolated point singularities at the positions of the nuclei. Our findings provide in particular mathematical foundation of recently reported, successful uses of deep neural networks in variational electron structure algorithms.


Introduction
The application of neural networks (NNs) as approximation architecture in numerical solution methods of partial differential equations (PDEs), possibly on high-dimensional parameter-and state-spaces, has attracted significant and increasing attention in recent years. We mention only [49,8,40,41,47] and the references therein. In these works, the solution of elliptic and parabolic boundary value problems is approximated by NNs which are found by minimization of a residual of the NN in the PDE.
A necessary condition for the performance of the mentioned NN-based numerical approximation methods is a high rate of approximation which is to hold uniformly over the solution set associated with the PDE under consideration. For elliptic boundary and eigenvalue problems, the function classes that weak solutions of the problems belong to are well known. Moreover, in many cases, representation systems such as splines or polynomials that achieve optimal linear or nonlinear approximation rates for functions belonging to these function classes have been identified. For functions belonging to a Sobolev or Besov type smoothness space of finite differentiation order such as continuously differentiable or Sobolev-regular functions on a bounded domain, upper bounds for the approximation rate by NNs were established for example in [52,13,53,27,50]. Here, we only mentioned results that use the ReLU activation function. Besides, for PDEs, the solutions of which have a particular structure, approximation rates of the solution that go beyond classical smoothness-based results can be achieved, such as in [9,46,24,4,21]. Again, we confine the list to publications with approximation rates for NNs with the ReLU activation function (referred to as ReLU NNs below).
In the present paper, we analyze approximation rates provided by ReLU NNs for solution classes of linear and nonlinear elliptic source and eigenvalue problems on polygonal and polyhedral domains. Mathematical results on weighted analytic regularity [15,16,14,1,5,17,34,7,30,33] imply that these classes consist of functions that are analytic with possible corner, edge, and corner-edge singularities.
Our analysis provides, for the aforementioned functions, approximation errors in Sobolev norms that decay exponentially in terms of the number of parameters M of the ReLU NNs.

Contribution
The principal contribution of this work is threefold: 1. We prove, in Theorem 4.3, a general result on the approximation by ReLU NNs of weighted analytic function classes on Q := (0, 1) d , where d = 2, 3. The analytic regularity of solutions is quantified via countably normed, analytic classes, based on weighted Sobolev spaces of Kondrat'ev type in Q, which admit corner and, in space dimension d = 3, also edge singularities. Such classes were introduced, e.g., in [5,1,14,15,16,7] and in the references there. We prove exponential expression rates by ReLU NNs in the sense that for a number M of free parameters of the NNs, the approximation error is bounded, in the H 1 -norm, by C exp(−bM 1/(2d+1) ) for constants b, C > 0.
2. Based on the ReLU NN approximation rate bound of Theorem 4.3, we establish, in Section 5, approximation results for solutions of different types of PDEs by NNs with ReLU activation. Concretely, in Section 5.1.1, we study the reapproximation of solutions of nonlinear Schrödinger equations with singular potentials in space dimension d = 2, 3. We prove that for solutions which are contained in weighted, analytic classes in Ω, ReLU NNs (whose realizations are continuous, piecewise affine) with at most M free parameters yield an approximation with accuracy of the order exp(−bM 1/(2d+1) ) for some b > 0. Importantly, this convergence is in the H 1 (Ω)-norm. In Section 5.1.2, we establish the same exponential approximation rates for the eigenstates of the Hartree-Fock model with singular potential in R 3 . This result constitutes the first, to our knowledge, mathematical underpinning of the recently reported, high efficiency of various NNbased approaches in variational electron structure computations, e.g., [39,20,18]. In Section 5.2, we demonstrate the same approximation rates also for elliptic boundary value problems with analytic coefficients and analytic right-hand sides, in plane, polygonal domains Ω. The approximation error of the NNs is, again, bound in the H 1 (Ω)-norm. We also infer an exponential NN expression rate bound for corresponding traces, in H 1/2 (∂Ω) and for viscous, incompressible flow. Finally, in Section 5.3, we obtain the same asymptotic exponential rates for the approximation of solutions to elliptic boundary value problems, with analytic data, on so-called Fichera-type domains Ω ⊂ R 3 (being, roughly speaking, finite unions of tensorized hexahedra). These solutions exhibit corner, edge and corner-edge singularities.
3. The exponential approximation rates of the ReLU NNs established here are based on emulating corresponding variable grid and degree ("hp") piecewise polynomial approximations. In particular, our construction comprises tensor product hp-approximations on Cartesian products of geometric partitions of intervals. In Theorem A.25, we establish novel tensor product hp-approximation results for weighted analytic functions on Q of the form u − u hp H 1 (Q) ≤ C exp(−b 2d √ N ) for d = 1, 2, 3, where N is the number of degrees of freedom in the representation of u hp and C, b > 0 are independent of N (but depend on u). The geometric partitions employed in Q and the architectures of the constructed ReLU NNs are of tensor product structure, and generalize to space dimension d > 3. We note that hp-approximations based on non-tensor-product, geometric partitions of Q into hexahedra have been studied before e.g. in [42,43] in space dimension d = 3. There, the rate of u − u hp H 1 (Q) exp(−b 5 √ N ) was found. Being based on tensorization, the present construction of exponentially convergent, tensorized hp-approximations in Appendix A does not invoke the rather involved polynomial trace liftings in [42,43], and is interesting in its own right: the geometric and mathematical simplification comes at the expense of a slightly smaller (still exponential) rate of approximation. Moreover, we expect that this construction of u hp will allow a rather direct derivation of rank bounds for tensor structured function approximation of u in Q, generalizing results in [22,23] and extending [32] from point to edge and corner-edge singularities.

Neural network approximation of weighted analytic function classes
The proof strategy that yields the main result, Theorem 4.3, is as follows. We first establish exponential approximation rates in the Sobolev space H 1 for tensor-product, so-called hp-finite elements for weighted analytic functions. Then, we re-approximate the corresponding quasi-interpolants by ReLU NNs.
The emulation of hp-finite element approximation by ReLU NNs is fundamentally based on the approximate multiplication network formalized in [52]. Based on the approximate multiplication operation and an extension thereof to errors measured with respect to W 1,q -norms, for q ∈ [1, ∞], we established already in [36] a reapproximation theorem of univariate splines of order p ∈ N on arbitrary meshes with N ∈ N cells. There, we observed that there exists a NN that reapproximates a variable-order, free-knot spline u in the H 1 -norm up to an error of ε > 0 with a number of free parameters that scales logarithmically in ε and |u| H 1 , linearly in N and quadratically in p. We recall this result in Proposition 3.7 below.
From this, it is apparent by the triangle inequality that, in univariate approximation problems where hp-finite elements yield exponential approximation rates, also ReLU NNs achieve exponential approximation rates, (albeit with a possibly smaller exponent, because of the quadratic dependence on p, see [36,Theorem 5.12]).
The extension of this result to higher dimensions for high-order finite elements that are built from univariate finite elements by tensorization is based on the underlying compositionality of NNs. Because of that, it is possible to compose a NN implementing a multiplication of d inputs with d approximations of univariate finite elements. We introduce a formal framework describing these operations in Section 3.
We remark that for high-dimensional functions with a radial structure, of which the univariate radial profile allows an exponentially convergent hp-approximation, exponential convergence was obtained in [36,Section 6] by composing ReLU approximations of univariate splines with an exponentially convergent approximation of the Euclidean norm, obtaining exponential convergence without the curse of dimensionality.

Outline
The manuscript is structured as follows: in Section 2, in particular Section 2.2, we review the weighted function spaces which will be used to describe the weighted analytic function classes in polytopes Ω that underlie our approximation results. In Section 2.3, we present an approximation result by tensor-product hp-finite elements for functions from the weighted analytic class. A proof of this result is provided in Appendix A. In Section 3 we review definitions of NNs and a "ReLU calculus" from [9,38] whose operations will be required in the ensuing NN approximation results.
In Section 4, we state and prove the key results of the present paper. In Section 5, we illustrate our results by deducing novel NN expression rate bounds for solution classes of several concrete examples of elliptic boundary-value and eigenvalue problems where solutions belong to the weighted analytic function classes introduced in Section 2. Some of the more technical proofs of Section 5 are deferred to Appendix B. In Section 6, we briefly recapitulate the principal mathematical results of this paper and indicate possible consequences and further directions.

Setting and functional spaces
We start by recalling some general notation that will be used throughout the paper. We also introduce some tools that are required to describe two and three dimensional domains as well as the associated weighted Sobolev spaces.
When we indicate a relation on |α| or |α|∞ in the subscript of a sum, we mean the sum over all multi-indices that fulfill that relation: e.g., for a k ∈ N0 For a domain Ω ⊂ R d , k ∈ N0 and for 1 ≤ p ≤ ∞, we indicate by W k,p (Ω) the classical L p (Ω)-based Sobolev space of order k. We write H k (Ω) = W k,2 (Ω). We introduce the norms · W 1,p mix (Ω) as For Ω = I1 × · · · × I d , with bounded intervals Ij ⊂ R, j = 1, . . . , d, H 1 mix (Ω) = H 1 (I1) ⊗ · · · ⊗ H 1 (I d ) with Hilbertian tensor products. Throughout, C will denote a generic positive constant whose value may change at each appearance, even within an equation.
The p norm, 1 ≤ p ≤ ∞, on R n is denoted by x p . The number of nonzero entries of a vector or matrix x is denoted by x 0.

Three dimensional domain.
Let Ω ⊂ R 3 be a bounded, polygonal/polyhedral domain. Let C denote a set of isolated points, situated either at the corners of Ω or in the interior of Ω (that we refer to as the singular corners in either case, for simplicity), and let E be a subset of the edges of Ω (the singular edges). Furthermore, denote by Ec ⊂ E the set of singular edges abutting at a corner c. For each c ∈ C and each e ∈ E, we introduce the following weights: For ε > 0, we define edge-, corner-, and corner-edge neighborhoods: Ω ε e := x ∈ Ω : re(x) < ε and rc(x) > ε, ∀c ∈ C , Ω ε c := x ∈ Ω : rc(x) < ε and ρce(x) > ε, ∀e ∈ E , Ω ε ce := x ∈ Ω : rc(x) < ε and ρce(x) < ε .
We fix a value ε > 0 small enough so that Ω ε c ∩ Ω ε c = ∅ for all c = c ∈ C and Q ε ce ∩ Ω ε ce = Ω ε e ∩ Ω ε e = ∅ for all singular edges e = e . In the sequel, we omit the dependence of the subdomains on ε. Let also Ωce, and Ω0 := Ω \ (ΩC ∪ ΩE ∪ ΩCE ). In each subdomain Ωce and Ωe, for any multi-index α ∈ N 3 0 , we denote by α the multi-index whose component in the coordinate direction parallel to e is equal to the component of α in the same direction, and which is zero in every other component. Moreover, we set α ⊥ := α − α .

Two dimensional domain.
Let Ω ⊂ R 2 be a polygon. We adopt the convention that E := ∅. For As in the three dimensional case, we fix a sufficiently small ε > 0 so that Ω ε c ∩ Ω ε c = ∅ for c = c ∈ C and write Ωc = Ω ε c . Furthermore, ΩC is defined as for d = 3, and Ω0 := Ω \ ΩC.

Weighted spaces with nonhomogeneous norms
We introduce classes of weighted, analytic functions in space dimension d = 3, as arise in analytic regularity theory for linear, elliptic boundary value problems in polyhedra, in the particular form introduced in [7]. There, the structure of the weights is in terms of Cartesian coordinates which is particularly suited for the presently adopted, tensorized approximation architectures. The definition of the corresponding classes when d = 2 is analogous. For a weight exponent vector γ = {γc, γe, c ∈ C, e ∈ E}, we introduce the nonhomogeneous, weighted Sobolev norms where (x)+ = max{0, x}. Moreover, we define the associated function space by For A, C > 0, we define the space of weighted analytic functions with nonhomogeneous norm as ∀c ∈ C and ∀e ∈ Ec, ∀k ∈ N0 .

Approximation of weighted analytic functions on tensor product geometric meshes
The approximation result of weighted analytic functions via NNs that we present below is based on emulating an approximation strategy of tensor product hp-finite elements. In this section, we present this hp-finite element approximation. Let I ⊂ R be an interval. A partition of I into N ∈ N intervals is a set G such that |G| = N , all elements of G are disjoint, connected, and open subsets of I, and We denote, for all p ∈ N0, by Qp(G) the piecewise polynomials of degree p on G.
One specific partition of I = [0, 1] is given by the one dimensional geometrically graded grid, which for σ ∈ (0, 1/2] and ∈ N, is given by where c is one of the corners of Q and let E = Ec contain the edges adjacent to c when d = 3, E = ∅ when d = 2. Further assume given constants C f , A f > 0, and Then, there exist Cp > 0, CL > 0 such that, for every 0 < ε < 1, there exist p, L ∈ N with 2. There holds We present the proof in Subsection A.9.3 after developing an appropriate framework of hp-approximation in Section A.

Basic ReLU neural network calculus
In the sequel, we distinguish between a neural network, as a collection of weights, and the associated realization of the NN. This is a function that is determined through the weights and an activation function. In this paper, we only consider the so-called ReLU activation: where N0 := d and N1, . . . , NL ∈ N, and where A ∈ R N ×N −1 and b ∈ R N for = 1, ..., L.
For a NN Φ, we define the associated realization of the NN Φ as where the output xL ∈ R N L results from Here is understood to act component-wise on vector-valued inputs, i.e., for y = (y 1 , . . . , y m ) ∈ R m , (y) := ( (y 1 ), . . . , (y m )). We call N (Φ) := d + L j=1 Nj the number of neurons of the NN Φ, L(Φ) := L the number of layers or depth, Mj(Φ) := Aj 0 + bj 0 the number of nonzero weights in the j-th layer, and M(Φ) := L j=1 Mj(Φ) the number of nonzero weights of Φ, also referred to as its size. We refer to NL as the dimension of the output layer of Φ.

Concatenation, parallelization, emulation of identity
An essential component in the ensuing proofs is to construct NNs out of simpler building blocks. For instance, given two NNs, we would like to identify another NN so that the realization of it equals the sum or the composition of the first two NNs. To describe these operations precisely, we introduce a formalism of operations on NNs below. The first of these operations is the concatenation. Proposition 3.2 (NN concatenation, [38,Remark 2.6]). Let L1, L2 ∈ N, and let Φ 1 , Φ 2 be two NNs of respective depths L1 and L2 such that N 1 0 = N 2 L 2 =: d, i.e., the input layer of Φ 1 has the same dimension as the output layer of Φ 2 .
Then, there exists a NN Φ 1 Φ 2 , called the sparse concatenation of Φ 1 and Φ 2 , such that The second fundamental operation on NNs is parallelization, achieved with the following construction. Proposition 3.3 (NN parallelization, [38,Definition 2.7]). Let L, d ∈ N and let Φ 1 , Φ 2 be two NNs with L layers and with d-dimensional input each. Then there exists a NN P(Φ 1 , Φ 2 ) with d-dimensional input and L layers, which we call the parallelization of Φ 1 and Φ 2 , such that Proposition 3.3 requires two NNs to have the same depth. If two NNs have different depth, then we can artificially enlarge one of them by concatenating with a NN that implements the identity. One possible construction of such a NN is presented next.
Finally, we sometimes require a parallelization of NNs that do not share inputs. Proposition 3.5 (Full parallelization of NNs with distinct inputs, [9,Setting 5.2]). Let L ∈ N and let be two NNs with L layers each and with input dimensions N 1 0 = d1 and N 2 0 = d2, respectively. Then there exists a NN, denoted by FP(Φ 1 , Φ 2 ), with d-dimensional input where d = (d1 + d2) and L layers, which we call the full parallelization of Φ 1 and Φ 2 , such that for all where, for j = 1, . . . , L, we define All properties of FP Φ 1 , Φ 2 claimed in the statement of the proposition follow immediately from the construction.

Emulation of multiplication and piecewise polynomials
In addition to the basic operations above, we use two types of functions that we can approximate especially efficiently with NNs. These are high dimensional multiplication functions and univariate piecewise polynomials. We first give the result of an emulation of a multiplication in arbitrary dimension.
and all j = 1, . . . , d, In addition to the high-dimensional multiplication, we can efficiently approximate univariate continuous, piecewise polynomial functions by realizations of NNs with the ReLU activation function.

Exponential approximation rates by realizations of NNs
We now establish several technical results on the exponentially consistent approximation by realizations of NNs with ReLU activation of univariate and multivariate tensorized polynomials. These results will be used to establish Theorem 4.3, which yields exponential approximation rates of NNs for functions in the weighted, analytic classes introduced in Section 2.2. They are of independent interest, as they imply that spectral and pseudospectral methods can, in principle, be emulated by realizations of NNs with ReLU activation.

NN-based approximation of univariate, piecewise polynomial functions
We start with the following corollary to Proposition 3.7. It quantifies stability and consistency of realizations of NNs with ReLU activation for the emulation of the univariate, piecewise polynomial basis functions in Theorem 2.1.
Proof. Let i = 1, . . . , N 1d . For vi as in the assumption of the corollary, we have that either supp(vi) = J for a unique J ∈ G 1d or supp(vi) = J ∪ J for two neighboring intervals J, J ∈ G 1d . Hence, there exists a partition Ti of I of at most four subintervals so that vi ∈ Sp(I, Ti), where p = (p) i∈{1,...,4} . Because of this, an application of Proposition 3.7 with q = 2 and Remark 3.8 yields that for every such that (4.1) holds. In addition, by invoking p 1 + |log(ε hp )| ≤ 1 + |log(ε1)|, we observe that Therefore, there exists C4 > 0 such that (4.2) holds. Furthermore, We use p 1 + |log(ε1)| and obtain that there exists C5 > 0 such that (4.3) holds.

Emulation of functions with singularities in cubic domains by NNs
Below we state a result describing the efficiency of re-approximating continuous, piecewise tensor product polynomial functions in a cubic domain, as introduced in Theorem 2.1, by realizations of NNs with the ReLU activation function.
Proof. Assume I = ∅ as otherwise there is nothing to show. Let CI ≥ 1 be such that C −1 Construction of the neural network. Invoking Corollary 4.1 we choose, for i = 1, . . . , N 1d , NNs and that, by Sobolev imbedding, Then, let Φ basis be the NN defined as where the full parallelization is of d copies of ). Note that Φ basis is a NN with d-dimensional input and dN 1d -dimensional output. Subsequently, we introduce the N d 1d matrices E (i 1 ,...,i d ) ∈ {0, 1} d×dN 1d such that, for all (i1, . . . , i d ) ∈ {1, . . . , N 1d } d , for all a = (a1, . . . , a dN 1d ) ∈ R dN 1d .
Note that, for all (i1, . . . , i d ) ∈ {1, . . . , N 1d } d , Then, we set where Π d ε 2 ,2 is according to Proposition 3.6. Note that, by (4.6), the inputs of Π d ε 2 ,2 are bounded in absolute value by 2. Finally, we define Approximation accuracy. Let us now analyze if Φε,c has the asserted approximation accuracy. We estimate by the triangle inequality that (4.10) We have that and, by another application of the triangle inequality, we have that where the last estimate follows from Proposition 3.6 and the chain rule: where we used (4.5). We now use (4.6) to bound the first term in (4.11): for d = 3, we have that, for all For d = 2, we end up with a similar estimate with only two terms. By the tensor product structure, it is clear that (I) ≤ dε1(cv,max + 1) d . We have from (4.10) and the considerations above that This yields (4.12).
Bound on the L ∞ norm of the neural network. As we have already shown, R (Φ v i ε 1 ) ∞ ≤ 2. Therefore, by Proposition 3.6, R (Φε) ∞ ≤ 2 d + ε2. It follows that Size of the neural network. Bounds on the size and depth of Φε,c follow from Proposition 3.6 and Corollary 4.1. Specifically, we start by remarking that there exists a constant C1 > 0 depending on Cv, bv, CI and d only, such that |log(ε1)| ≤ C1(1 + |log ε|). Then, by Corollary 4.1, there exist constants C4, C5 > 0 depending on Cp, Cv, bv, (b − a), and d only such that for all i = 1, . . . , N 1d ,  Using also the fact that N 1d ≤ C(1 + |log ε| 2 ) for C > 0 independent of ε and since d ≥ 2, for a constant C11 > 0 depending on Cp, Cv, bv, (b − a), d and Cc only.
Next, we state our main approximation result, which describes the approximation of singular functions in (0, 1) d by realizations of NNs. Assume furthermore that C f , A f > 0, and Then, for every such that the hypotheses of Theorem 4.2 are met, and We have, by Theorem 2.1 and the triangle inequality, that for Φ ε, Then, the application of Theorem 4.2 (with ε/2 instead of ε) concludes the proof of (4.12). Finally, the bounds on follow from the corresponding estimates of Theorem 4.2.
Theorem 4.3 admits a straightforward generalization to functions with multivariate output, so that each coordinate is a weighted analytic function with the same regularity. Here, we denote for a NN Φ with N -dimensional output, N ∈ N, by R(Φ)n the n-th component of the output (where n ∈ {1, . . . , N }).
Then, for all f = (f1, . . . , (4.13) Proof. Let Φε be as in (4.8) and let c (n) ∈ R N 1d ×···×N 1d , n = 1, . . . , N f be the matrices of coefficients such that, in the notation of the proof of Theorems 4.2 and 4.3, for all n ∈ {1, . . . , N f }, We define, for vec as defined in (4.9), the NN Φ ε,f as The estimate (4.13) and the L ∞ -bound then follow from Theorem 4.2. The bound on L(Φ ε,f ) follows directly from Theorem 4.2 and Proposition 3.2. Finally, the bound on M(Φ ε,f ) follows by Theorem 4.2 and Proposition 3.2, as well as, from the observation that for a constant C > 0 independent of N f and ε.

Exponential expression rates for solution classes of PDEs
In this section, we develop Theorem 4.3 into several exponentially decreasing upper bounds for the rates of approximation, by realizations of NNs with ReLU activation, for solution classes to elliptic PDEs with singular data (such as singular coefficients or domains with nonsmooth boundary). In particular, we consider elliptic PDEs in two-dimensional general polygonal domains, in three-dimensional domains that are a union of cubes, and elliptic eigenvalue problems with isolated point singularities in the potential which arise in models of electron structure in quantum mechanics.
In each class of examples, the solution sets belong to the class of weighted analytic functions introduced in Subsection 2.2. However, the approximation rates established in Section 4 only hold on tensor product domains with singularities on the boundary. Therefore, we will first extend the exponential NN approximation rates to functions which exhibit singularities on a set of isolated points internal to the domain, arising from singular potentials of nonlinear Schrödinger operators. In Section 5.2, we demonstrate, using an argument based on a partition of unity, that the approximation problem on general polygonal domains can be reduced to that on tensor product domains and Fichera-type domains, and establish exponential NN expression rates for linear elliptic source and eigenvalue problems. In Section 5.3, we show exponential NN expression rates for classes of weighted analytic functions on twoand three-dimensional Fichera-type domains.

Nonlinear eigenvalue problems with isolated point singularities
Point singularities emerge in the solutions of elliptic eigenvalue problems, as arise, for example, for electrostatic interactions between charged particles that are modelled mathematically as point sources in R 3 . Other problems that exhibit point singularities appear in general relativity, and for electron structure models in quantum mechanics. We concentrate here on the expression rate of "ab initio" NN approximation of the electron density near isolated singularities of the nuclear potential. Via a ReLU-based partition of unity argument, an exponential approximation rate bound for a single, isolated point singularity in Theorem 5.1 is extended in Corollary 5.4 to electron densities corresponding to potentials with multiple point singularities at a priori known locations, modeling (static) molecules.
The numerical approximation in ab initio electron structure computations with NNs has been recently reported to be competitive with other established, methodologies (e.g. [39,20] and the references there). The exponential ReLU expression rate bounds obtained here can, in part, underpin competitive performances of NNs in (static) electron structure computations.
We recall that all NNs are realized with the ReLU activation function, see (3.1).

Nonlinear Schrödinger equations
Let , be a flat torus and let V : Ω → R be a potential such that V (x) ≥ V0 > 0 for all x ∈ Ω and there exists δ > 0 and AV > 0 such that where r(x) = dist(x, (0, . . . , 0)). For k ∈ {0, 1, 2}, we introduce the Schrödinger eigenproblem that consists in finding the smallest eigenvalue λ ∈ R and an associated eigenfunction u ∈ H 1 (Ω) such that There holds the following approximation result.
Then, for every 0 < ε ≤ 1 there exists a NN Φε,u such that In addition, as ε → 0, Proof

Hartree-Fock model
The Hartree-Fock model is an approximation of the full many-body representation of a quantum system under the Born-Oppenheimer approximation, where the many-body wave function is replaced by a sum of Slater determinants. Under this hypothesis, for M, N ∈ N, the Hartree-Fock energy of a system with N electrons and M nuclei with positive charges Zi at isolated locations Ri ∈ R 3 , reads , and ρ(x) = τ (x, x), see, e.g., [25,26]. The Euler-Lagrange equations of (5.4) read [25]
Then, (5.6), the L ∞ bound and the depth and size bounds on the NN Φε,ϕ follow from the hp approximation result in Proposition A.25 (centered in R k by translation), from Theorem 4.2, as in Corollary 4.4.
The arguments in the preceding subsections applied to wave functions for a single, isolated nucleus modelled by the singular potential V as in (5.1) can then be extended to give upper bounds on the approximation rates achieved by realizations of NNs of the wave functions in a bounded, sufficiently large domain containing all singularities of the nuclear potential in (5.4).
Corollary 5.4. Assume that (5.5) has N real eigenvalues λ1, . . . , λN with associated eigenfunctions ϕ1, . . . , ϕN , such that Proof. The proof is based on a partition of unity argument. We only sketch it at this point, but will develop it in detail in the proof of Theorem 5.6. Let T be a tetrahedral, regular triangulation of Ω, and let {κ k } Nκ k=1 be the hat-basis functions associated to it. We suppose that the triangulation is sufficiently refined to ensure that, for all k ∈ {1, . . . , Nκ}, exists a cube Ω k ⊂ Ω such that supp(κ k ) ⊂ Ω k and that For all k ∈ {1, . . . , Nκ}, by [19,Theorem 5.2], which is based on [51], there exists a NN Φ κ k such that .

Elliptic PDEs in polygonal domains
We establish exponential expressivity for realizations of NNs with ReLU activation of solution classes to elliptic PDEs in polygonal domains Ω, the boundaries ∂Ω of which are Lipschitz and consist of a finite number of straight line segments. Notably, Ω ⊂ R 2 need not be a finite union of axiparallel rectangles. In the following lemma, we construct a partition of unity in Ω subordinate to an open covering, of which each element is the affine image of one out of three canonical patches. Remark that we admit corners with associate angle of aperture π; this will be instrumental, in Corollaries 5.11 and 5.12, for the imposition of different boundary conditions on ∂Ω. The three canonical patches that we consider are listed in Lemma 5.5, item [P2]. Affine images of (0, 1) 2 are used away from corners of ∂Ω and when the internal angle of a corner is smaller than π. Affine images of (−1, 1) × (0, 1) are used near corners with internal angle π. PDE solutions may exhibit point singularities near such corners e.g. if the two neighboring edges have different types of boundary conditions. Affine images of (−1, 1) 2 \ (−1, 0] 2 are used near corners with internal angle larger than π. In the proof of Theorem 5.6, we use on each patch Theorem 4.3 or a result from Subsection 5.3 below. A triangulation T of Ω is defined as a finite partition of Ω into open triangles K such that K∈T K = Ω. A regular triangulation of Ω is, additionally, a triangulation T of Ω such that, for any two neighboring elements K1, K2 ∈ T , K1 ∩ K2 is either a corner of both K1 and K2 or an entire edge of both K1 and K2. For a regular triangulation T of Ω, we denote by S1(Ω, T ) the space of functions v ∈ C(Ω) such that for every K ∈ T , v|K ∈ P1.
We postpone the proof of Lemma 5.5 to Appendix B.1.
The following statement, then, provides expression rates for the NN approximation of functions in weighted analytic classes in polygonal domains.
We recall that all NNs are realized with the ReLU activation function, see (3.1).

Theorem 5.6.
Let Ω ⊂ R 2 be a polygon with Lipschitz boundary consisting of straight sides and with a finite set C of corners. Let γ = {γc : c ∈ C} such that min γ > 1. Then, for all u ∈ J γ (Ω; C, ∅) and for every 0 < ε < 1, there exists a NN Φε,u such that In addition, as ε → 0, Proof. We introduce, using Lemma 5.5, a regular triangulation T of R 2 , an open cover {Ωi} of Ω, and a partition of unity {φi} Np i=1 ∈ [S1(Ω, T )] Np such that the properties [P1] -[P3] of Lemma 5.5 hold. We defineûi := u | Ω i • ψi : Ωi → R. Since u ∈ J γ (Ω; C, ∅) with min γ > 1 and since the maps ψi are affine, we observe that for every i ∈ {1, . . . , Np}, there exists γ such that min γ > 1 and By Theorem 4.3 and by Lemma 5.21 and Theorem 5.16 in the forthcoming Subsection 5.3, there exist Np and there exists C∞ > 0 independent of ε1 such that, for all i ∈ {1, . . . , Np} and allε ∈ (0, 1) The NNs given by Theorem 4.3, Lemma 5.21 and Theorem 5.16, which we here denote by Φû i ε 1 for i = 1, . . . , Np, may not have equal depth. Therefore, for all i = 1, . . . , Np and suitable Li ∈ N we define Φû i To estimate the size of the enlarged NNs, we use the fact that the size of a NN is not smaller than the depth unless the associated realization is constant. In the latter case, we could replace the NN by a NN with one nonzero weight without changing the realization. By this argument, we obtain for all i = 1, . . . , Np that Here we use that T is a partition R 2 , so that φi is defined on all of R 2 and [19, Theorem 5.2] applies, which itself is based on [51]. Similarly to the previously handled case of Φû i ε 1 , we can assume that Φ φ i for i = 1, . . . , Np all have equal depth and that the size of Φ φ i is bounded independent of i.
Since by [P2] the mappings ψi are affine and invertible, it follows that ψ −1 i is affine for every . . , Np}.

Size of the neural network.
To bound the size of the NN, we remark that Np and the sizes of Φ ψ −1 i and of Φ φ i only depend on the domain Ω. Furthermore, there exist constants CΩ,i, i = 1, 2, 3, that depend only on Ω and u such that From Theorem 4.3 and Proposition 3.6, in addition, there exist constants C L u , C M u , C× > 0 such that, for all 0 < ε1, ε× ≤ 1, Then, by (5.13), we have (5.20) The desired depth and size bounds follow from (5.18), (5.19), and (5.20). This concludes the proof.
The exponential expression rate for the class of weighted, analytic functions in Ω by realizations of NNs with ReLU activation in the H 1 (Ω)-norm established in Theorem 5.6 implies an exponential expression rate bound on ∂Ω, via the trace map and the fact that ∂Ω can be exactly parametrized by the realization of a shallow NN with ReLU activation. This is relevant for NN-based solution of boundary integral equations.

Corollary 5.7. (NN expression of Dirichlet traces)
Let Ω ⊂ R 2 be a polygon with Lipschitz boundary and a finite set C of corners. Let γ = {γc : c ∈ C} such that min γ > 1. For any connected component Γ of ∂Ω, let Γ > 0 be the length of Γ, such that there exists a continuous, piecewise affine parametrization θ : [0, Γ] → R 2 : t → θ(t) of Γ with finitely many affine linear pieces and d dt θ 2 = 1 for almost all t ∈ [0, Γ]. Then, for all u ∈ J γ (Ω; C, ∅) and for all 0 < ε < 1, there exists a NN Φ ε,u,θ approximating the trace In addition, as ε → 0, Proof. We note that both components of θ are continuous, piecewise affine functions on [0, Γ], thus they can be represented exactly as realization of a NN of depth two, with the ReLU activation function. Moreover, the number of weights of these NNs is of the order of the number of affine linear pieces of θ.
Next, for any ε ∈ (0, 1), let Φ ε/C Γ ,u be as given by Theorem 5.6. Define Φ ε,u,θ := Φ ε/C Γ ,u Φ θ . It follows that The bounds on its depth and size follow directly from Proposition 3.2, Theorem 5.6, and the fact that the depth and size of Φ θ are independent of ε. This finishes the proof.
Remark 5.8. The exponent 5 in the bound on the NN size M(Φ ε,u,θ ) in Corollary 5.7 is likely not optimal, due to it being transferred from the NN rate in Ω.
The proof of Theorem 5.6 established exponential expressivity of realizations of NNs with ReLU activation for the analytic class J γ (Ω; C, ∅) in Ω. This implies that realizations of NNs can approximate, with exponential expressivity, solution classes of elliptic PDEs in polygonal domains Ω. We illustrate this by formulating concrete results for three problem classes: second order, linear, elliptic source and eigenvalue problems in Ω, and viscous, incompressible flow. To formulate the results, we specify the assumptions on Ω. Definition 5.9 (Linear, second order, elliptic divergence-form differential operator with analytic coefficients). Let d ∈ {2, 3} and let Ω ⊂ R d be a bounded domain. Let the coefficient functions aij, bi, c : Ω → R be real analytic in Ω, and such that the matrix function A = (aij) 1≤i,j≤d : Ω → R d×d is symmetric and uniformly positive definite in Ω. With these functions, we define the linear, second order, elliptic divergence-form differential operator L acting on w ∈ C ∞ 0 (Ω) via (summation over repeated indices i, j ∈ {1, . . . , d})  Proof. The proof is obtained by verifying weighted, analytic regularity of solutions. By [2,Theorem 3.1] there exists γ such that min γ > 1 and u ∈ J γ (Ω; C, ∅). Then, the application of Theorem 5.6 concludes the proof.
Next, we address NN expression rates for eigenfunctions of (L, B). The analytic regularity of solutions u in the proof of Theorem 5.6 also holds for certain nonlinear, elliptic PDEs. We illustrate it for the velocity field of viscous, incompressible flow in Ω. Proof. The velocity fields of Leray solutions of the Navier-Stokes equations in Ω satisfy the weighted, analytic regularity u ∈ J γ (Ω; C, ∅) 2 , with min γ > 1, see [33]. Then, the application of Theorem 5.6 concludes the proof.

Elliptic PDEs in Fichera-type polyhedral domains
Fichera-type polyhedral domains Ω ⊂ R 3 are, loosely speaking, closures of finite, disjoint unions of (possibly affinely mapped) axiparallel hexahedra with ∂Ω Lipschitz. In Fichera-type domains, analytic regularity of solutions of linear, elliptic boundary value problems from acoustics and linear elasticity in displacement formulation has been established in [7]. As an example of a boundary value problem covered by [7] and our theory, consider Ω := (−1, 1) d \ (−1, 0] d for d = 2, 3, displayed for d = 3 in Figure 2. We recall that all NNs are realized with the ReLU activation function, see (3.1). We introduce the setting for elliptic problems with analytic coefficients in Ω. Note that the boundary of Ω is composed of 6 edges when d = 2 and of 9 faces when d = 3.  The following extension lemma will be useful for the approximation of the solution to (5.29) by NNs. We postpone its proof to Appendix B.2. Proof. By Lemma 5.15, we extend the function u to a functionũ such that Note that, by the stability of the extension, there exists a constant Cext > 0 independent of u such that By Theorem A.25 exist Cp > 0, C N 1d > 0, C N int > 0, Cṽ > 0, C c > 0, and bṽ > 0 such that, for all 0 < ε ≤ 1, there exists p ∈ N, a partition G 1d of (−1, 1) into Nint open, disjoint, connected subintervals, a d-dimensional array c ∈ R N 1d ×···× N 1d , and piecewise polynomialsṽi ∈ Qp(G 1d ) ∩ H 1 ((−1, 1)), i = 1, . . . , N 1d , such that and Due to the stability (5.31) and to Lemmas A.21 and A.22, there holds i.e., the bound on the coefficients c is independent of the extensionũ of u. By Theorem 4.2, there exists a NN Φε,u with the stated approximation properties and asymptotic size bounds. The bound on the L ∞ (Ω) norm of the realization of Φε,u follows as in the proof of Theorem 4.3.
Remark 5.17. Arguing as in Corollary 5.7, a NN with ReLU activation and two-dimensional input can be constructed so that its realization approximates the Dirichlet trace of solutions to (5.29) in H 1/2 (∂Ω) at an exponential rate in terms of the NN size M.
The following statement now gives expression rate bounds for the approximation of solutions to the Fichera problem (5.29) by realizations of NNs with the ReLU activation function.   [8].

], a slightly different elliptic boundary value problem is numerically approximated by realizations of NNs. Its solutions exhibit the same weighted, analytic regularity as considered in this paper. The presently obtained approximation rates by NN realizations extend also to the approximation of solutions for the problem considered in
In the proof of Theorem 5.6, we require in particular the approximation of weighted analytic functions on (−1, 1)×(0, 1) with a corner singularity at the origin. For convenient reference, we detail the argument in this case.

Conclusions and extensions
We review the main findings of the present paper and outline extensions of the present results, and perspectives for further research.

Principal mathematical results
We established exponential expressivity of realizations of NNs with the ReLU activation function in the Sobolev norm H 1 for functions which belong to certain countably normed, weighted analytic function spaces in cubes Q = (0, 1) d of dimension d = 2, 3. The admissible function classes comprise functions which are real analytic at points x ∈ Q, and which admit analytic extensions to the open sides F ⊂ ∂Q, but may have singularities at corners and (in space dimension d = 3) edges of Q. We have also extended this result to cover exponential expressivity of realizations of NNs with ReLU activation for solution classes of linear, second order elliptic PDEs in divergence form in plane, polygonal domains and of elliptic, nonlinear eigenvalue problems with singular potentials in three space dimensions. Being essentially an approximation result, the DNN expression rate bound in Theorem 5.6 will apply to any elliptic boundary value problem in polygonal domains where weighted, analytic regularity is available. Apart from the source and eigenvalue problems, such regularity is in space dimension d = 2 also available for linearized elastostatics, Stokes flow and general elliptic systems [14,17,7]. The established approximation rates of realizations of NNs with ReLU activation are fundamentally based on a novel exponential upper bound on approximation of weighted analytic functions via tensorized hp approximations on multi-patch configurations in finite unions of axiparallel rectangles/hexahedra. The hp approximation result is presented in Theorem A.25 and of independent interest in the numerical analysis of spectral elements.
The proofs of exponential expressivity of NN realizations are, in principle, constructive. They are based on explicit bounds on the coefficients of hp projections and on corresponding emulation rate bounds for the (re)approximation of modal hp bases.

Extensions and future work
The tensor structure of the hp approximation considered here limited geometries of domains that are admissible for our results. Curvilinear, mapped domains with analytic domain maps will allow corresponding approximation rates, with the NN approximations obtained by composing the present constructions with NN emulations of the domain maps and the fact that compositions of NNs are again NNs.
The only activation function considered in this work is the ReLU. Following the same proof strategy, exponential expression rate bounds can be obtained for functions with smoother, nonlinear activation functions. We refer to Remark 5.20 and to the discussion in [46,Sec. 3.3].
The principal results in Section 5.1 yield exponential expressivity of realizations of NNs with ReLU activation for singular eigenvalue problems with multiple, isolated point singularities as arise in electronstructure computations for static molecules with known loci of the nuclei. Inspection of our proofs reveals that the expression rate bounds are robust with respect to perturbations of the nuclei sites; only interatomic distances enter the constants in the expression rate bounds of Section 5.1.2. Given the closedness of NNs under composition, obtaining similar expression rates also for solutions of the vibrational Schrödinger equation appears in principle possible.
The presently proved deep ReLU NN expression rate bounds can, in connection with recently proposed, residual-based DNN training methodologies (e.g., [48]), imply exponential convergence rates of numerical NN approximations of PDE solutions based on machine learning approaches.

A Tensor product hp approximation
In this section, we construct the hp tensor product approximation which will then be emulated to obtain the NN expression rate estimates. We denote Q = (0, 1) d , d ∈ {2, 3} and introduce the set of corners C, and the set of edges E, The results in this section extend, by rotation or reflection, to the case where C contains any of the corners of Q and E is the set of the adjacent edges when d = 3. Most of the section addresses the construction of exponentially consistent hp-quasiinterpolants in the reference cube (0, 1) d ; in Section A.10 the analysis will be extended to domains which are specific finite unions of such patches.

For an element
. . , }, we denote by d K c the distance from the singular corner, and d K e the distance from the closest singular edge. We observe that The hp tensor product space is defined as Additionally, we will denote, for all σ ∈ (0, 1/2],

A.2 Local projector
We denote the reference interval by I = (−1, 1) and the reference cube by K = (−1, 1) d . We also write Let p ≥ 1: we introduce the univariate projectors πp : H 1 (I) → Pp(I) as where Ln is the nth Legendre polynomial, L ∞ normalized, and (·, ·) is the scalar product of L 2 ((−1, 1)). Note that ( πpv) (±1) =v(±1), ∀v ∈ H 1 (I). (A.8) For p ∈ N, we introduce the projection on the reference element K as Πp = d i=1 πp. For all K ∈ G d , we introduce an affine transformation from K to the reference element ΦK : K → K such that ΦK (K) = K. (A.9) Remark that since the elements are axiparallel, the affine transformation can be written as a d-fold product of one dimensional affine transformations φ k : J k → I, i.e., supposing that Let K ∈ G d and let ki, i = 1, . . . , d be the indices such that For v defined on K such that v • Φ −1 K ∈ H 1 mix ( K) and for (p1, . . . , p d ) ∈ N d , we introduce the local projection operator We also write For later reference, we note the following property of Π K p v: Lemma A.1. Let K1, K2 be two axiparallel cubes that share one regular face F (i.e., F is an entire face of both K1 and K2). Then, for v ∈ H 1 mix (int (K1 ∪ K2)), the piecewise polynomial is continuous across F .

A.4 Preliminary estimates
The projector on K given by has the following property.

A.4.1 One dimensional estimate
The following result is a consequence of, e.g., [ Proof. From [43,Lemma 8.1], there exists C > 0 independent of p, k, s, and v such that In addition, for all k = 1, . . . , , there holds x| J k ≥ σ 1−σ h. Hence, for all γ < s + 1, This concludes the proof.

A.5 Interior estimates
The following lemmas give the estimate of the approximation error on the elements not belonging to edge or corner layers. For d = 3, all ∈ N, all k1, k2, k3 ∈ {0, . . . , } and all K = J k 1 × J k 2 × J k 3 , we denote, by h the length of K in the direction parallel to the closest singular edge, and by h ⊥,1 and h ⊥,2 the lengths of K in the other two directions. If an element has multiple closest singular edges, we choose one of those and consider it as "closest edge" for all points in that element. When considering functions from J d γ (Q), γe will refer to the weight of this closest edge. Similarly, we denote by ∂ (resp. ∂ ⊥,1 and ∂ ⊥,2 ) the derivatives in the direction parallel (resp. perpendicular) to the closest singular edge. Lemma A.9. Let d = 3, ∈ N and K = J k 1 × J k 2 × J k 3 for 0 < k1, k2, k3 ≤ . Let also v ∈ J γ (Q; C, E; Cv, Av) with γc ∈ (3/2, 5/2), γe ∈ (1, 2). Then, there exists C > 0 dependent only on σ, Cappx2, Cv and A > 0 dependent only on σ, Av such that for all 1 ≤ s ≤ p where ∂ is the derivative in the direction parallel to the closest singular edge.
Proof. We write da = d K a , a ∈ {c, e}. There holds , using the result of Lemma A.6 and rescaling, we have and do similarly for the other terms of the sum (II) and (III) and the other subscripts e, ce, 0. Remark also that r i| K ≥ di, i ∈ {c, e}, and that for a, b ∈ R holds r a c r b e = r a+b c ρ b ce . We will also write γ = γc − γe. We start by considering the term (I)ce. Let α1 = α2 = 1; then, , where we have also used de ≤ dc. Therefore, If s + 1 + α1 + α2 − γc < 0, then s = 1 and α1 = α2 = 0, thus where the last inequality follows also from de ≤ dc. If s + 1 + α1 + α2 − γc < 0, then the same bound holds with d 2γc−2 c replaced by d 2 c . Similarly, where we used that de ≤ 1. The bound on (I)0 follows directly from the definition: Using (2.1), there exists C > 0 dependent only on Cv and σ and A > 0 dependent only on Av and σ such that We then apply the same argument to the terms (II) and (III). Indeed, and the estimate for (III)ce follows by exchanging h ⊥,1 and ∂ ⊥,1 with h ⊥,2 and ∂ ⊥,2 in the inequality above. The estimates for (II)c,e,0 and (III)c,e,0 can be obtained as for (I)c,e,0: We obtain, from (A.26), (A.27), and (A.28) that there exists C > 0 (dependent only on σ, Cappx2, Cv and A > 0 (dependent only on σ, Av) such that Considering that completes the proof.
Proof. The proof follows closely that of Lemma A.9 and we use the same notation. From Lemma A.6 and rescaling, we have As before, we will write γ = γc − γe. We start by considering the term (I)ce. When α1 = 1, Therefore, The estimates for (I)c,e,0 follow from the same technique: Hence, from (2.1), there exists C > 0 dependent only on Cv and σ and A > 0 dependent only on Av and σ such that We then apply the same argument to the terms (II) and (III). Indeed, if s + 1 + α1 + α2 − γc ≥ 0 where at the last step we have used that γe > 1 and de ≤ dc. If s + 1 + α1 + α2 − γc < 0, then Thus, using de ≤ dc, The estimates for (II)c,e,0 and (III)ce,c,e,0 can be obtained as above: We obtain, from (A.30), (A.31), and (A.32) that there exists C > 0 dependent only on σ, Cappx2, Cv and A > 0 dependent only on σ, Av such that Considering that and considering that the estimate for the other term at the left-hand side of (A.29) is obtained by exchanging {h, ∂} ⊥,1 with {h, ∂} ⊥,2 completes the proof.
Lemma A.11. Let d = 3, ∈ N and K = J k 1 × J k 2 × J k 3 for 0 < k1, k2, k3 ≤ . Let also v ∈ J γ (Q; C, E; Cv, Av) with γc ∈ (3/2, 5/2), γe ∈ (1, 2). Then, there exists C > 0 dependent only on σ, Cappx1, Cv and A > 0 dependent only on σ, Av such that for all p ∈ N and all 1 ≤ s ≤ p Proof. The proof follows closely that of Lemmas A.9 and A.10; we use the same notation. From Lemma A.4 and rescaling, we have Most terms at the right-hand side above have already been considered in the proofs of Lemmas A.9 and A.10, and the terms with α1 = α2 = 0 can be estimated similarly; the observation that concludes the proof.
We summarize Lemmas A.9 to A.11 in the following result.
Lemma A.13. Let d = 3, ∈ N and K = J k 1 × J k 2 × J k 3 such that kj = 0 for one j ∈ {1, 2, 3} and 0 < ki ≤ for i = j. For all p ∈ N and all 1 ≤ s ≤ p, let pj = 1 and pi = p ∈ N for i = j. Let also v ∈ J γ (Q; C, E; Cv, Av) with γc ∈ (3/2, 5/2), γe ∈ (1, 2). Then, there exists C > 0 dependent only on σ, Cappx1, Cappx2, Cv and A > 0 dependent only on σ, Av such that Proof. We write da = d K a , a ∈ {c, e}. Suppose, for ease of notation, that j = 3, i.e. k3 = 0. The projector is then given by Π K pp1 = π k 1 p ⊗ π k 2 p ⊗ π 0 1 . Also, we denote h ⊥,2 = σ and ∂ ⊥,2 = ∂x 3 . By (A. 16), The bounds on the terms (I) and (II) can be derived as in Lemma A.9, and give We consider then term (III): with the usual notation, writing γ = γc − γe, (A.37) Note that dc ≥ de and where we have also used that dc ≤ 1. Hence, The bounds on the terms (III)c,e,0 follow by the same argument: Then, The bounds on the first two terms at the right-hand side above can be obtained as in Lemma A.10: while the last term can be bounded as in (A.39), The same holds true for the last term of the gradient of the approximation error, given by From Lemma A.10, we obtain whereas for the third term, it holds that if α1 + α2 and if α1 + α2 + 2 − γc < 0, then and for all α1 + α2 + 2 − γc ∈ R, (III)e and (III)0 satisfy the bounds that (III)ce and (III)c satisfy in case α1 + α2 + 2 − γc < 0, so that Finally, the bound on the L 2 (K) norm of the approximation error can be obtained by a combination of the estimates above.
A similar statement holds when d = 2, and the proof follows along the same lines.
Lemma A.15. Let d = 2 and v ∈ J γ (Q; C, E) with γc > 1. There exists a constant C0 > 0 such that if p ≥ C0 , there exist constants C, b > 0 such that

A.6 Estimates on elements along an edge in three dimensions
In the following lemma, we consider the elements K along one edge, but separated from the singular corner.
Lemma A.16. Let d = 3, e ∈ E and let K ∈ G 3 be such that d K c > 0 for all c ∈ C and d K e = 0. Let Cv, Av > 0. Then, if v ∈ J γ (Q; C, E; Cv, Av) with γc ∈ (3/2, 5/2), γe ∈ (1, 2), there exist C, A > 0 such that for all p ∈ N and all 1 ≤ s ≤ p, with (p1, p2, p3) ∈ N 3 such that p = p, p ⊥, Proof. We suppose that K = J k × J 0 × J 0 for some k ∈ {1, . . . , }, the elements along other edges follow by symmetry. This implies that the singular edge is parallel to the first coordinate direction. Furthermore, we denote Π K p11 = π k p ⊗ (π 0 1 ⊗ π 0 1 ) = π ⊗ π ⊥ . For α = (α1, α2, α3) ∈ N 3 0 , we write α = (α1, 0, 0) and α ⊥ = (0, α2, α3). Also, We start by considering the first terms at the right-hand side of the above equation. We also compute the norms over Kce = K ∩ Qce; the estimate on the norms over Kc = K ∩ Qc and Ke = K ∩ Qe follow by similar or simpler arguments. By (A.21) from Lemma A.8, we have that if γc < 2 On Ke, the same bound holds as on Kce for γc ≥ 2, and on Kc the same bounds hold as on Kce for γc < 2. By the same argument, for |α | = 1, and We now turn to the second part of the right-hand side of (A.41). We use (A.20) from Lemma A.8 so that (A.44) By Lemma A.7 we have, recalling that α = s + 1 and 1 ≤ s ≤ p, for all |α ⊥ | ≤ 1, and, for all |α ⊥ | = 2, using that π and multiplication by re commute, because re does not depend on x1, Then, remarking that |x1| rc |x1|, combining (A.44) with the two inequalities above we obtain Adjusting the exponent of the weights, replacing h and h ⊥ with their definition, we find that there exists A > 0 depending only on σ and Av such that As before, there exists A > 0 depending only on σ and Av such that Lemma A.17. Let d = 3 and v ∈ J γ (Q; C, E) with γc > 3/2, γe > 1. There exists a constant C0 > 0 such that if p ≥ C0 , there exist constants C, b > 0 such that Proof. As in the proof of Lemma A.14, we may assume that γc ∈ (3/2, 5/2) and γe ∈ (1, 2). The proof of this statements follows by summing over the right-hand side of (A.40), i.e., = C((I) + (II)).

A.8 Exponential convergence
The exponential convergence of the approximation in the full domain Q follows then from Lemmas A.14, A.15, A.17, and A.18.
Then, there exist constants cp > 0 and C, b > 0 such that, for all ∈ N, With respect to the dimension of the discrete space ).
The following statement is a direct consequence of the two lemmas above and the fact that For ε > 0, we choose Then, by (A.73), This concludes the proof of Items 1 and 2. Finally, Item 3 follows from Proposition A.24 and the fact that p ≤ Cp (1 + |log(ε)|) for a constant Cp > 0 independent of ε.

A.10 Combination of multiple patches
The approximation results in the domain Q = (0, 1) d can be generalized to include the combination of multiple patches. We give here an example, relevant for the PDEs considered in Section 5. For the sake of conciseness, we show a single construction that takes into account all singularities of the problems in Section 5. We will then use this construction to prove expression rate bounds for realizations of NNs.
For all ∈ N, define then Ki : K1, . . . , K d ∈ G 1 }, see Figure 3. The hp space in Ω = (−a, a) d is then given by Finally, recall the definition of π ,p hp from (A.12) and construct π ,p hp : W 1,1 ((−a, a)) → X ,p hp,1 such that, for all v ∈ W 1,1 ((−a, a)),  where for all j = 1, . . . , d and ij = 1, . . . , N 1d ,ṽi j ∈ X ,p hp,1 with support in at most two, neighboring elements of G 1 . Finally, there exist constants C1, C2 > 0 independent of such that Proof. The statement is a direct consequence of Propositions A.19 and A.24. We start the proof by showing that for any function v ∈ W 1,1 mix (Ω), the approximation Π ,p hp,d v is continuous; the rest of the theorem will then follow from the results in each sub-patch. Let now w ∈ W 1,1 ((−a, a)). Then, it holds that π ,p hp w |I ∈ C(I), for all I ∈ {(0, a/2), (a/2, a), (−a/2, 0), (−a, −a/2)}, by definition (A.76). Furthermore, by the nodal exactness of the local projectors, forx ∈ {−a/2, 0, a/2}, there holds implying then that π ,p hp w is continuous. Since Π ,p hp,d = d j=1 π ,p hp , this implies that Π ,p hp,d v is continuous for all v ∈ W 1,1 mix (Ω). Fix k ∈ {1, . . . , 4 d } such that v ∈ J γ k (Ω k ; C k , E k ). There exist then, by Proposition A.19, constants C, b, cp > 0 such that for all ∈ N v − Π Proof of Lemma 5.5. For any two sets X, Y ⊂ Ω, we denote by distΩ(X, Y ) the infimum of Euclidean lengths of paths in Ω connecting an element of X with one of Y . We introduce several domain-dependent quantities to be used in the construction of the triangulation T with the properties stated in the lemma.
Let E denote the set of edges of the polygon Ω. For each corner c ∈ C at which the interior angle of Ω is smaller than π (below called convex corner), we fix a parallelogram Gc ⊂ Ω and a bijective, affine transformation Fc : (0, 1) 2 → Gc such that • Fc((0, 0)) = c, • two edges of Gc coincide partially with the edges of Ω abutting at the corner c.
For any shape regular triangulation T of R 2 , such that for all K ∈ T , K ∩ ∂Ω = ∅, denote TΩ = {K ∈ T : K ⊂ Ω} and h(TΩ) = maxK∈T Ω h(K), where h(K) denotes the diameter of K. Denote by NΩ the set of nodes of T that are in Ω.  Let T be a triangulation of R 2 such that and such that for all K ∈ T it holds K ∩ ∂Ω = ∅.
The hat-function basis {φn}n∈N Ω is a basis for P1(TΩ) such that supp(φn) ⊂ patch(n) for all n ∈ NΩ, and it is a partition of unity.
We will show that, for each n ∈ NΩ, there exists a subdomain Ωn with the desired properties, such that patch(n) ∩ Ω ⊂ Ωn. We point to Figure 4 for an illustration of the patches Ωn that will be introduced in the proof, for different sets of nodes.
Thus, the connected component of Sn ∩ Ω containing n is a rectangle, which we define to be Ωn.

B.2 Proof of Lemma 5.15
Proof of Lemma 5.15. Let d = 3 and denote R = (−1, 0) 3 . Denote by O the origin, and let E = {e1, e2, e3} denote the set of edges of R abutting the origin. Let also F = {f1, f2, f3} denote the set of faces of R abutting the origin, i.e., the faces of R such that fi ⊂ R ∩ Ω, i = 1, 2, 3. Let, finally, for each f ∈ F , E f = {e ∈ E : e ⊂ f denote the subset of E containing the edges neighboring f . For each e ∈ E, define ue to be the lifting of u|e into R, i.e., the function such that ue|e = ue and ue is constant in the two coordinate directions perpendicular to e. Similarly, let, for each f ∈ F , u f be such that u f | f = u| f and u f is constant in the direction perpendicular to f . We define w : R → R as where u0 = u(O). Since u|e ∈ W 1,1 (e), u| f ∈ W 1,1 mix (f ) for all e ∈ E and f ∈ F , there holds ue ∈ W 1,1 mix (R) and u f ∈ W 1,1 mix (R) for all e ∈ E and f ∈ F (cf. Equations (A.56) and (A.60)), hence w ∈ W 1,1 mix (R). Furthermore, note that ue − u0 | e = 0, for all E e = e and that From the first equality in (B.2), then, it follows that, for all f ∈ F , Let the function v be defined as v|R = w, v|Ω = u.