On the exponent of exponential convergence of the $p$-version FEM spaces

We study the exponent of the exponential rate of convergence in terms of the number of degrees of freedom for various non-standard {$p$-version} finite element spaces employing reduced cardinality basis. More specifically, we show that serendipity finite element methods and discontinuous Galerkin finite element methods with total degree $\mathcal{P}_p$ basis have a faster exponential convergence with respect to the number of degrees of freedom than their counterparts employing the tensor product $\mathcal{Q}_p$ basis for quadrilateral/hexahedral elements, for piecewise analytic problems under $p$-refinement. The above results are proven by using a new $p$-optimal error bound for the $L^2$-orthogonal projection onto the total degree $\mathcal{P}_p$ basis, and for the $H^1$-projection onto the serendipity finite element space over tensor product elements with dimension $d\geq2$. These new $p$-optimal error bounds lead to a larger exponent of the exponential rate of convergence with respect to the number of degrees of freedom. Moreover, these results show that part of the basis functions in $\mathcal{Q}_p$ basis {plays} no roles in achieving the $hp$-optimal error bound in the Sobolev space. The sharpness of theoretical results is also verified by a series of numerical examples.


Introduction
Polynomial approximation on tensor product domains plays an important role in deriving the exponential rate of convergence with respect to the number of degrees of freedom for hp-version finite element methods (FEMs) [18,21,22,19,4,20,26,30,25] and hp-version discontinuous Galerkin finite element methods (DGFEMs) [23,24,32,27,28,29]. In general, the proof of the exponential rate of convergence usually depends on the hp-approximation results for some suitable projection operators onto a local polynomial space consisting of polynomials with degree less or equal than p in each variable (known as Q p basis) over a tensor product element (quadrilateral/hexahedral elements), for dimension d ≥ 2.
The key reason for using the Q p basis over a tensor product element is because hpoptimal approximation results for the multi-dimensional projection operators can be derived by using the stability and approximation results of the one-dimensional projections via tensor product arguments. On the other hand, the hp-approximation results for L 2 -orthogonal projections onto polynomial basis with total degree less or equal than p (P p basis) and H 1 -projections onto serendipity basis (S p basis) have not been fully explored. Typically, hp-error bounds for projections onto the P p or S p basis has been derived using the fact that there exists a q ≤ p such that the bases P p or S p contain Q q as a subset, together with the help of the hp-optimal approximation results for the projections onto the basis Q q .
For instance, we consider the L 2 -norm error bound of the two-dimensional L 2orthogonal projection Π Qp onto the Q p basis as an example, cf. [24]. Letκ = (−1, 1) 2 and u ∈ H l (κ), l is an integer with l ≥ 0. Then, the following estimate holds, where the constant C(s) is independent of p and 0 ≤ s ≤ min{p + 1, l}. It is straightforward to see that the above error bound is sharp in the sense that it is p-optimal in both Sobolev regularity index l and polynomial approximation order p.
Next, we consider the L 2 -norm error bound of L 2 -orthogonal projection Π Pp onto the P p basis. We define Π Pp = Π Q ⌊p/2⌋ , with ⌊p/2⌋ denoting the largest integer which is less than or equal to p/2. Then, the following bound holds: where the constantC(s) is independent of p and 0 ≤ s ≤ min{⌊p/2⌋ + 1, l}. We emphasize that the above error bound is p-optimal for functions with finite Sobolev regularity with p sufficiently large, but is p-suboptimal by at least p/2 orders for sufficiently smooth functions. The similar p-suboptimal error bound holds for H 1projections onto S p basis. Using the p-suboptimal error bound for L 2 -orthogonal projections onto the P p basis and H 1 -projections onto the S p basis, it is possible to derive an exponential rate of convergence for hp-FEMs employing the S p basis and hp-DGFEMs employing the P p basis, but the resulting exponent is much smaller with respect to the number of degrees of freedom than the exponent of FEMs and DGFEMs employing the Q p basis. This contradicts the numerical observation in work [12,11,10,15], where it is observed that the error with respect to the number of degrees of freedom for DGFEMs with the P p basis on tensor product elements has a steeper exponential convergence compared to DGFEMs with the Q p basis, for sufficiently smooth problems. This situation has been numerically tested on many different examples. We also observed numerically that the ratio of the slope of the exponential error decay for DGFEMs with the P p basis compared to that of the Q p basis depends only on the space dimension. The same phenomenon is also observed when comparing conforming FEMs with the S p basis and the Q p basis.
The disagreement between the numerical observations and theoretical results implies that the error bound (1.2) is not a sharp bound for P p and S p basis. To address this, in this work, we derive an hp-optimal error bound for the L 2 -orthogonal projection onto the P p basis in the L 2 -norm, and for the H 1 -projection onto the S p basis in the L 2 -norm and H 1 -seminorm.
The technique for proving the new error bounds is different from the existing techniques for hp-approximation with the Q p basis, due to the lack of a tensor product structure in the P p and S p bases, thereby hindering the use of the usual tensor product arguments. The key tools used in this work are: a multi-dimensional orthogonal polynomial expansion and the careful selection of basis functions. The resulting bounds for both projections are hp-optimal with respect to both Sobolev regularity and polynomial approximation order. Moreover, it also shows that the Q p basis contains in a sense "extra" basis functions other than those necessary for optimal convergence. These basis functions do not increase the order in p of the error bound, but instead only reduce its "constant".
By using the new hp-optimal error bound for the L 2 -orthogonal projection onto the P p basis and the H 1 -projection onto the S p basis, we can prove that methods using P p and S p bases offer exponential convergence with a larger exponent with respect to the number of degrees of freedom than comparable methods using Q p basis for piecewise analytic problem under p-refinement. Furthermore, the approximation results also show that there are a lot of basis functions in Q p basis play no roles in improving the hp-optimal error bound, which can be generalized to other FEM with the local polynomial space employing reduced cardinality basis.
The remainder of this work is structured as follows. In Section 2, we introduce the required notation and the weighted Sobolev spaces together with some properties about the orthogonal polynomials. Then, the p-optimal error bound for the L 2orthogonal projection onto the P p basis in L 2 -norm is proved in Section 3. In Section 4, we derive the p-optimal error bound for H 1 -projection onto the S p basis in both L 2 -norm and H 1 -seminorm. Section 5 is devoted to deriving the exponential rate of convergence for the L 2 -and the H 1 -projections employing different local polynomial bases. The sharpness of the approximation results is verified through a series of numerical examples in Section 6.
Next, we define the following shorthand notation for the summations of indices. For multi-indices i and α satisfying i ≥ α we define ∞ i≥α := ∞ i1=α1 · · · ∞ i d =α d and the summation for multi-indices i satisfying |i| ≥ p is defined as ∞ |i|=p . Moreover, we also define a summation for a multi-index i satisfying multiple conditions, e.g. multi-index i satisfying the condition i ≥ α and the condition |i| ≥ p is defined as ∞ |i|=p,i≥α . We introduce a function Φ d (m, n) which will be used frequently in this work, given by where Γ is the Gamma function satisfying Γ(n + 1) = n! for integer n ≥ 0.
Next, we define the weighted Sobolev spaces V l (κ) as a closure of C ∞ (κ) in the norm with the weights W α , defined by It is easy to see that |u| V l (κ) ≤ |u| H l (κ) , ∀u ∈ H l (κ), with some integer l ≥ 0. We note that the above definition for weighted Sobolev spaces can be extended to the fractional order weighted Sobolev spaces and weighted Besov spaces by using the real interpolation techniques, cf. [9]. For u ∈ L 2 (κ), we introduce the Legendre polynomial expansion over the reference elementκ, given by (2.4) where x = (x 1 , . . . , x d ), and L i k (x k ) denotes the Legendre polynomial with order i k over the variable x k . The coefficients a i are defined by The derivatives of the function u can be expressed as The derivatives of the Legendre polynomials satisfy the orthogonality property see [30,Lemma 3.10]. By employing (2.7), we have Identity (2.8) establishes a link between the derivatives of the functions in the weighted L 2 -norms and their Legendre polynomial expansions.
Remark 2.1. The weighted Sobolev space in the above definition is a special case of the general Jacobi-weighted Sobolev spaces introduced in [6]. The key reason to introduce the Jacobi-weighted Sobolev spaces is to deal with the loss of orthogonality suffered by orthogonal polynomials in standard Sobolev spaces; the L 2 -orthogonality is preserved in Jacobi-weighted Sobolev spaces. As we shall see in the forthcoming analysis, orthogonality plays a key role in deriving optimal error bounds in the polynomial order p.
3. The L 2 -orthogonal projection operator onto the P p basis In this section, we derive an hp-optimal error bound for the L 2 -orthogonal projection over the reference elementκ := (−1, 1) d .
3.1. The L 2 -orthogonal projection operator. For the reference elementκ, we define P p (κ) and Q p (κ) be the space of all polynomials with total degree less than or equal to p and with partial degree less than or equal to p, respectively.
In order to distinguish the same projections onto spaces with different polynomial basis, we use subscripts to signify the basis type: we use Π Qp := Π to denote the L 2 -projection onto Q p , which is constructed by using tensor product arguments together with the one dimensional L 2 -projection with respect to variable x k , given by Π (k) p . On the other hand, the L 2 -projection onto P p is denoted by Π Pp . First, we have the following hp-optimal approximation result for the L 2 -orthogonal projection Π Qp , c.f. [24, Lemma 3.4].
Then, for any integer s, with 0 ≤ s ≤ min{p + 1, l}, and W k = W k (x k ), we have: Proof. The result is proved by modifying the proof of Lemma 3.4 in [24]. Instead of using triangle inequality, we use the orthogonality and stability of the onedimensional L 2 -orthogonal projection, which leads to the error bound (3.1).
We remark on the asymptotic behaviour of the Gamma function. Making use of Stirling's formula, see (9.15) in [16], we have √ 2πn n+ 1 2 e −n ≤ Γ(n + 1) ≤ en n+ 1 2 e −n , n ≥ 1, (3.2) and it follows with 0 ≤ s ≤ min{p + 1, l} and C(s) depending on the constant s, but independent of p. This implies that the error bound (3.1) is optimal in p with respect to both the Sobolev regularity index l and polynomial order p.
Next, we introduce a useful lemma which is the key tool in proving the optimal error bounds in p. The proof of the lemma is postponed until Section 3.2.
Furthermore, the maximum value of F (ξ, ρ) under the above constraints on ξ and ρ is attained at ξ k = m/d, ρ k = M/d, k = 1, . . . , d. Let Π Pp u be the L 2 -projection of u onto P p (κ) with p ≥ 0. Then, for any integer s, 0 ≤ s ≤ min{p + 1, l}, we have: Proof. Using the relation (2.4) and (2.7) , for any integer s, 0 ≤ s ≤ min{p + 1, l}, we have where in step one, the index set is enlarged; indeed, some of the terms with multiindex |i| ≥ p + 1 have been used more than once; in step three, we use Lemma 3.2, taking ξ k = α k ≥ 0, ρ k = i k ≥ 0, M = p + 1, m = s, together with the restriction 0 ≤ s ≤ min{p + 1, l}; in step four, we used (2.8) and in the last step the bound holds from (3.3).
Remark 3.4. We point out that the above proof for the L 2 -orthogonal projection Π Pp on d-dimensional reference element is a natural extension of the proof for one-dimensional result, see [30] for details. By comparing the L 2 -norm bound (3.1) for the projection Π Qp and (3.5) for the projection Π Pp , it is easy to see that both bounds are p-optimal with respect to Sobolev regularity index l and also for polynomial order p. Moreover, we can see that the bound in (3.5) will have a larger constant compared to the bound in (3.1), and this constant depends on the dimension d. This result will play a key role in deriving the exponential convergence for the P p basis.
3.2. The Proof of Lemma 3.2. The proof will be split into three steps.
Step 1: The proof follows a constrained optimization procedure. We set, and we calculate the stationary points. We consider the partial derivatives with respect to ξ k and ρ k , k = 1, . . . , d, which satisfy the equations with k = 1, . . . , d, by using the fact that F (ξ, ρ) > 0. The right-hand sides of the two equations in (3.7) are independent of the index k. Moreover, the function φ(z) = Γ(z) ′ /Γ(z) is the so-called Digamma function with the property that (see [1], (6.3.16)) where γ is the Euler-Mascheroni constant. For z ≥ 0, the function φ(z + 1) is a continuous monotonically increasing function, which shows that (3.7) have only one solution. This solution isξ k = m/d andρ k = M/d, k = 1, . . . , d, and the F (ξ, ρ) will have the extreme value at this stationary point, given by Step 2: In order to find the global maximum, we need to prove the following asymptotic relationship: This is proven by considering three different cases. We first consider the special case m = 0. In this case, (3.9) holds trivially. Next, we consider the case m = δM , with 0 < δ < 1. By using the property (3.2) of Gamma functions, we have the following bound: By recalling that 0 < δ < 1 and n = 1, . . . , d − 1, we have that 0 < 1−δ 1+δ < 1 and the function ( d n ) 2δM is monotonically increasing with respect to M . This implies that, for M ≥ (d + n) log( e √ 2π ) + d−n 2 log( 1+δ 1−δ ) 2δ log( d n ) −1 , the above quotient formula is greater than 1 and therefore (3.9) holds. Finally, we consider the case m = M . Using the same techniques used to derive (3.10) together with the fact that Γ(1) = 1, we have By using the fact that exponentially increasing functions grow faster than polynomials, we know that for sufficiently large M the right hand side of (3.11) is greater than 1 and therefore (3.9) holds.
Step 3: Finally, we need to show that the extreme value (3.8) is the global maximum value of F (ξ, ρ) under the constraints |ξ| = m and |ρ| = M .
First, we can see that the function F (ξ, ρ) is symmetric and continuous with respect to ξ and ρ. The constraints |ξ| = m and |ρ| = M restrict the domain of ξ and ρ to be a (d − 1)-dimensional simplex, which is convex and compact. So the maximum value of the function F (ξ, ρ) over the domain will be obtained only at the boundary of the domain or the stationary point of F (ξ, ρ). We have calculated the function value at the stationary point in (3.8) already, so now we just need to check the function values on the boundary of the domain.
This may be proved by induction. We start with the case d = 2: the domain of ξ and ρ satisfying the constrains are two straight lines, ρ 1 + ρ 2 = M and ξ 1 + ξ 2 = m. Here, the stationary point is the mid-point of each of the two linesξ = (m/2, m/2), ρ = (M/2, M/2), and the boundary of the domain consist of the points ξ b = (0, m), ρ b = (0, M ) or ξ b = (m, 0), ρ b = (M, 0), due to the constraints ρ ≥ ξ. Using the symmetry of the function and of the domain, we know that at the two boundary points of the domain, F (ξ, ρ) will attain the same value, with F (ξ b , ρ b ) = Φ 1 (M, m). By using the asymptotic relation (3.9), we find The above relation shows that the extreme value (3.8) is the global maximum value under the constraints for d = 2.
Next, we consider the case d = 3, where the domain of each of ξ and ρ will be a triangle. In this case, the stationary point of F (ξ, ρ) is when ξ and ρ are located at the barycenter of their respective triangle. The boundary of each domain consists of 3 straight lines. We need to calculate the maximum value of F (ξ, ρ) on the boundary of the domain. By using the symmetry of F (ξ, ρ), and that fact that |ξ| = m and |ρ| = M , we only need to consider one part of domain boundary where ξ 3 = 0 and ρ 3 = 0. Then, the maximum of F (ξ, ρ) on the domain boundary can be viewed as exactly the same problem with the same constraints as in the case d = 2. Consequently, the maximum value of F (ξ, ρ) along the boundary of the domain is Again, by using the same techniques as for d = 2, we deduce that . The above relation shows that the extreme value (3.8) is the global maximum value under the constraints for d = 3. For the general d-dimensional case, the proof can be carried out in a similar way. The key observation is that the maximum value of F (ξ, ρ) on the boundary of d-dimensional domain will be at the stationary point of F (ξ, ρ) on the (d − 1)-dimensional domain. By using the relation the proof is complete.

The H 1 -projection operator onto the S p basis
In this section, we shall consider the H 1 -projection over the reference element κ := (−1, 1) d with d = 2, 3. Since the three dimensional results depend on the two dimensional results, we start with the two dimensional case.

4.1.
The H 1 -projection operator on the reference square. First, we introduce the two-dimensional serendipity finite element space, cf. [30] We can see in Figure 1 that the serendipity space S p contains two more basis functions than the P p basis for p ≥ 2. Another way to define the serendipity basis is to consider the decomposition of the C 0 finite element space with Q p basis overκ. For polynomial order p, the S p basis has the same number of nodal basis functions and edge basis functions as the Q p basis, but the S p basis only has internal moment basis functions (those with zero value along the element boundary ∂κ) whose total degree is less than or equal p, cf. [30,31]. We note that serendipity FEMs can be defined in a dimension-independent fashion, see [3].
Similarly to the case of the L 2 -projection, we use π Qp := π (1) p π (2) p to denote the H 1 -projection onto the Q p basis, which can be constructed via a tensor product of one dimensional H 1 -projection with respect to variable x k , given by π p . Similarly, the H 1 -projection onto the S p basis is denoted by π Sp , which is defined in (4.6). Now, we construct the two-dimensional H 1 -projection explicitly by using the one-dimensional H 1 -projection and tensor product arguments, see [30,23] the projection Π Qp−1 and Π (k) p−1 are the two dimensional and one dimensional L 2orthogonal projections, respectively, the coefficients a i1i2 , b i1 and c i2 are given by: and the polynomial function ψ j (z) = z −1 L j (z) dz with degree j + 1, and satisfies .
Next, we rearrange the relation (4.2) by separating the internal moment basis functions: so that the first double summation in (4.5) only contains the internal moment basis functions. From the definition of S p , π Sp can be constructed by removing the internal moment basis functions with polynomial order greater than p in π Qp . More specifically, π Sp u ∈ S p (κ), p ≥ 4, is defined by For 1 ≤ p ≤ 3, the first term in (4.6) will vanish, because there are no internal moment basis functions for the serendipity basis in that case. In this work, we focus on the high order polynomial cases, so we only consider the H 1 -projection π Sp for p ≥ 4.
Next, we recall the following approximation lemma for π Qp from [23].
Then, we have π Qp u = u at the vertices ofκ, (4.7) and the following error estimates hold: for any integer s, 1 ≤ s ≤ min{p, l}.
Then, we derive the hp-error bound for the H 1 -projection π Sp for p ≥ 4. Theorem 4.2. Letκ = (−1, 1) 2 . Suppose that u ∈ H l+1 (κ), for some l ≥ 1. Let π Sp u be the H 1 projection of u onto S p (κ) with p ≥ 4. Then, we have π Sp u = u at the vertices ofκ, (4.10) and for any integer s, 1 ≤ s ≤ min{p, l}, the following error estimates hold: Proof. The key observation is the fact that the serendipity basis S p differs from the Q p basis only at the internal moment basis functions which vanish along the boundary ofκ. Indeed, using (4.5) and (4.6), we have Using the fact that ψ j (±1) = 0 for j ≥ 1, we deduce that (π Qp u − π Sp u)| ∂κ = 0. Thus, (4.10) is proved.
Next, we derive (4.11). The first step is the use of the triangle inequality, Thus, we only need to consider the error from the second term in the above bound. By using (2.2) and the orthogonality relation (4.4) of ψ j (x) for j ≥ 1 and 1 ≤ s ≤ min{p, l}, we have where in step three, we enlarged the summation index sets by adding the high order internal moment basis functions with coefficients a i1i2 , i k ≥ 1 for k = 1, 2 and |i| ≥ p − 1. Thus, we have in step two, we enlarge the index set by adding functions with coefficients a i1i2 whose index satisfying the relation |i| ≥ p − 1, , and m = s + 1, together with the restriction 1 ≤ s ≤ min{p, l}; in step four, we use (2.8) and (4.3) to build up the link between the derivatives of u and coefficients a i1i2 and in the last step, we use (3.3).
Using the same techniques, we can derive the error estimate for the H 1 -seminorm. We have In the last step, we enlarge the summation index sets by adding the high order internal moment basis functions with coefficients a i1i2 , i k ≥ 1 for k = 1, 2 and |i| ≥ p − 1. Thus, we have where in step two, we enlarge the index set again; in step three we use Lemma 3.2, Therefore, we have the bound (4.20) Finally, using (4.17), (4.20) and Lemma 4.1, the bounds (4.11) and (4.12) follow.

4.2.
The H 1 -projection operator on the reference cube. In this section, we shall consider the H 1 -projection operator over the reference cubeκ := (−1, 1) 3 . First, we introduce the 3D serendipity finite element space. A simple way to define the serendipity basis is to consider a decomposition of the C 0 finite element space with Q p basis overκ. For polynomial order p, the S p basis has the same number of nodal basis functions and edge basis functions as the Q p basis, but the S p basis only has face basis functions (those with zero value on twelve edges and eight vertices) and internal moment basis functions (those with zero value along the element boundary ∂κ) whose total degree is less than or equal p. The number of basis functions of S p basis is calculated in the following way Similarly to the 2D case, we use π Qp := π p to denote the H 1 -projection onto the Q p basis. The H 1 -projection onto the S p basis is denoted by π Sp . Additionally, we introduce some new notation for the forthcoming analysis. The projection π (1,2) Sp shall denote the H 1 -projection onto the serendipity spaces S p with variables (x 1 , x 2 ) only, and the projections π First, we explicitly construct the three-dimensional projection π Qp = π p . For u ∈ H l (κ), l ≥ 3, the projection π Qp u ∈ Q p (κ), p ≥ 1, is defined by Then, the following Legendre polynomial expansion holds: together with e i1 , f i2 and g i3 Now, we separate the face basis functions and internal moment basis functions from (4.22).
+ edge basis + nodal basis. (4.25) Here, the first triple summation terms contains all the internal moment basis functions only. Three double summation terms contain all the face basis functions. The edge basis functions and nodal basis functions will not be written explicitly because they play no role in the analysis.
From the definition of S p , π Sp u can be constructed by removing the face basis functions and internal moment basis functions with polynomial order greater than p in π Qp u. More specifically, π Sp u ∈ S p (κ), p ≥ 6, is defined by + edge basis + nodal basis. (4.26) For 1 ≤ p ≤ 3, both face basis functions and internal moment basis functions in (4.26) will vanish. For 4 ≤ p ≤ 5, internal moment basis functions in (4.26) will vanish. Similar to the 2D case, we only consider the H 1 -projection π Sp for p ≥ 6.
Next, by using the stability and approximation results for one dimensional H 1projection in [23], we can derive the following approximation results for π Qp . Lemma 4.3. Letκ = (−1, 1) 3 . Suppose that u ∈ H l+1 (κ), for some l ≥ 2. Let π Qp u be the H 1 -projection of u onto Q p (κ) with p ≥ 1. Then, we have π Qp u = u at the vertices ofκ, (4.27) and the following error estimates hold:
Then, we derive the hp-error bound for the H 1 -projection π Sp for p ≥ 6.
Theorem 4.4. Letκ = (−1, 1) 3 . Suppose that u ∈ H l+1 (κ), for some l ≥ 2. Let π Sp u be the H 1 projection of u onto S p (κ) with p ≥ 6. Then, we have π Sp u = u at the vertices ofκ, (4. 30) and for any integer s, 2 ≤ s ≤ min{p, l}, the following error estimates hold: Here, C 1 and C 2 are positive constants independent of p, l and s.
Proof. The proof of (4.30) is similar to the two dimensional case, by using the fact that the serendipity basis S p differs from Q p only in the face basis functions and internal basis functions which have zero values at each vertex.
Next, we begin to prove relation (4.31). Using (4.25) and (4.26), we have We note that the term T 1 only contains the internal moment basis functions, and the three other terms only contain the face basis functions. By using (2.2), the orthogonality relation (4.4) of ψ j (x) for j ≥ 1 and 2 ≤ s ≤ min{p, l}, we have where in step two, we enlarged the summation index sets by adding the high order internal moment basis functions with coefficients a i1i2i3 , i k ≥ 1 for k = 1, 2, 3 and |i| ≥ p − 2; in the last step, we used the relation 1 i k (i k +1) ≤ 6 (i k +α k +1)(i k +α k +2) for internal moment basis functions with i k ≥ 1 and i k ≥ α k , k = 1, 2, 3. Thus, we have where in step two, we enlarged the summation index set by adding functions with coefficients a i1i2i3 whose index satisfying the relation |i| ≥ p − 2, 3 k=1 i k = 0; in step three, we used Lemma 3.2, with ξ k = α k + 1 ≥ 1, ρ k = i k + 1 ≥ 1, k = 1, 2, 3, together with M = p + 1, m = s + 1, and the restriction 2 ≤ s ≤ min{p, l} and in the last step, we used the relation (3.3).
Next, we will derive the error bound for the term T 2 . We first rewrite T 2 in the following way The key observation is that T 2,a = π Sp u, see Appendix. By using the 2D approximation results (4.17), we have the following bound We note that T 2,b only contains the internal moment basis functions, so using (2.2) and the orthogonality relation (4.4) of ψ j (x) for j ≥ 1, we have , (4.38) where in step two, we used the fact that the multi-index set for T 2,b is a subset of multi-index i with |i| ≥ p−2 and i k ≥ 1, k = 1, 2, 3 and in the last step, we enlarged the index set of the summations by adding the high order internal moment basis functions with coefficients a i1i2i3 , i k ≥ 1 for k = 1, 2, 3 and |i| ≥ p − 2. By using and (4.35), the following bound holds (4.39) Combining (4.37) and (4.39) together with the asymptotic relation (3.9), the following bound for T 2 holds It is easy to verify that T 3 and T 4 satisfy the same error bound as T 2 , and thus the L 2 -norm error bound (4.31) is proved.
The error estimate for the H 1 -seminorm (4.32) can be derived by using the same techniques used in deriving (4.31). As such, we have where in step one, we enlarged the summation index set by adding functions with coefficients a i1i2i3 whose index satisfying the relation |i| ≥ p − 2, 3 k=1 i k = 0; in step two we use Lemma 3.2, taking Next, we consider the error bound for term T 2 . By using (4.19), the following bound holds for T 2,a With the help of (4.38) and (4.42), the following bound holds Combing (4.43), (4.44) together with the asymptotic relation (3.9), then the following error bound for term T 2 .
The above error bound for T 2 term is also the error bound for terms T 3 and T 4 . By using (4.43) and (4.44), it is to see that the following relation holds where C * is a positive constant independent of p, s and l. We note that above L 2 -norm error bound also holds for the rest two partial derivatives. So the H 1seminorm bound (4.32) is proved. Similarly to the comparisons for the L 2 -projection onto P p and Q p , both bounds are p-optimal in both Sobolev regularity and polynomial order. We can also see that the bounds for π Sp have a larger constant than those for π Qp , and this constant depends on dimension d. Moreover, we point out that the optimal approximation results for the H 1 -projection with S p basis in Theorem 4.2 and Theorem 4.4 directly imply the hp-optimal error bound for the L 2 -norm on the trace ofκ for π Sp .
Remark 4.6. We note that in the Theorem 4.2 and 4.4, the minimum Sobolev regularity requirements for defining H 1 -projection is u ∈ H d (κ) for the reference element. In fact, this regularity requirement can be relaxed by using the the tensor product Sobolev spaces, cf. [30,17]. In this work, we do not consider the minimum regularity assumptions because we only consider the standard Sobolev spaces.

4.3.
The H 1 -projection operator onto the P p basis. Finally, we present the error bound for π Pp which we shall define now. The key observation is that the P p basis with polynomial order p contains the S p+1−d basis for p ≥ d, see [3]. Then, we can simply define π Pp = π S p+1−d for d = 2, 3.
Corollary 4.7. Letκ = (−1, 1) d , d = 2, 3. Suppose that u ∈ H l+1 (κ), for some l ≥ d−1. Let π P p u := π S p+1−d u be the H 1 projection of u onto P p (κ) with p ≥ 3d−1. Then, we have: π P p u = u at the vertices ofκ, (4.47) and the following error estimates hold: Remark 4.8. We emphasize that the above error bound for the π P p projection is p-suboptimal by one order for d = 2 and two orders for d = 3 for sufficiently smooth functions, but it is p-optimal for functions with finite Sobolev regularity in the case l ≤ p + 1 − d. However, sub-optimality by one or two orders in p is better than using the π Q ⌊p/d⌋ projection, as suggested by [30] (see Corollary 4.52 on p190), which is sub-optimal in p by at least p/2 orders for sufficiently smooth functions for d = 2. Moreover, the one or two order sub-optimality in p for analytic functions does not influence the exponent of the exponential rate of convergence, as we shall see below.

Exponential convergence for analytic solutions
We shall be concerned with the proof of exponential convergence for serendipity FEMs and DGFEMs with the P p basis over tensor product elements. For simplicity, we only consider the case when the given problem is piecewise analytic over the whole computational domain. Exponential convergence is then achieved by fixing the computational mesh, and increasing the polynomial order p. Only parallelepiped meshes are considered, which are the affine family obtained from the reference elementκ = (−1, 1) d . The analysis of FEMs and DGFEMs with a general hp-refinement strategy is beyond the scope of this analysis (see [26,27,28,29] for the analysis for both methods employing the Q p basis).
The proof of exponential convergence for FEMs and DGFEMs depends on proving exponential convergence of L 2 -and H 1 -projections for piecewise analytic functions under p-refinement. The H 1 -projection π Sp onto S p can be directly applied to p-FEMs for second order elliptic problems with the same optimal rate as the H 1 projection π Qp , see [30] for details. For deriving error bounds of DGFEMs using the L 2 -and H 1 -projections onto Q p , we refer to [23,24,17]. Following similar techniques, we can prove the corresponding hp-bounds for DGFEMs employing the P p basis, albeit with sub-optimal rate in p. The sub-optimality in p is due to the fact that the p-optimal bound for L 2 -projection onto P p basis over the trace of the tensor product elements is still open. Additionally, the H 1 -projection onto the P p basis is suboptimal in p by d − 1 orders for sufficiently smooth functions. However, we point out that the suboptimality in p by d − 1 order, with d = 2, 3, does not influence the exponent of the exponential rate of convergence.
Next, we focus on deriving the exponential convergence for the L 2 -projections in the L 2 -norm and H 1 -projections in the L 2 -norm and H 1 -seminorm on analytic problems under p-refinement on shape-regular d-parallelepiped meshes. The extension to anisotropic meshes will be consider in the future.
Let κ be a parallelepiped element. For a function u having an analytic extension into an open neighbourhood ofκ, we have for every s κ ≥ 0: where |κ| denotes the measure of element κ, cf. [14, Theorem 1.9.3].
Lemma 5.1. Let u : κ → R have an analytic extension to an open neighbourhood ofκ. Also let p κ ≥ 0 and 0 ≤ s κ ≤ p κ + 1 be two positive numbers such that s κ = ǫ(p κ + 1), 0 ≤ ǫ ≤ 1 and d = 2, 3. Then the following bounds hold: Here, C and C(u) are positive constants depending elemental shape regularity, and C(u) also depends on u.
Proof. Using standard scaling arguments for κ together with Lemma 3.1 and Theorem 3.3, we have the approximation results for the L 2 -projection over κ. For brevity, we set q κ = p κ + 1. By employing the relation (3.3) and the fact |u| V l (κ) ≤ |u| H l (κ) , we have the bounds: Recalling (5.1), we have R κ > 0, Therefore, we have the exponential convergence for the L 2 -projection Π Qp κ , via where, In order to compare with the slope of projection Π Qp κ , here we will use the same ǫ min . We have Thus, we have Next, we begin to derive the exponential convergence for H 1 -projections.
Lemma 5.2. Let u : κ → R have an analytic extension to an open neighbourhood ofκ. Also let p κ ≥ 2d and (d − 1) ≤ s κ ≤ p κ be two positive numbers such that s κ = ǫp κ , 0 < ǫ ≤ 1 and d = 2, 3. Then the following bounds hold: Here, C and C(u) are positive constants depending elemental shape regularity, and C(u) also depends on u.
The proof follows by the same techniques used in Lemma 5.1.
In the above Lemma 5.1 and Lemma 5.2, we can see that the L 2 -norm error for both L 2 -projections Π Qp κ and Π Pp κ , and the L 2 -norm and H 1 -seminorm error for the H 1 -projections π Sp κ and π Qp κ decay exponentially for analytic functions under p-refinement. If we measure the error against p, the exponent b 1,κ for the Q p basis is slightly greater than the exponent b 2,κ for the P p basis and S p basis by a small factor of (log d)/ 1 + R 2 κ . By using Lemma 5.1 and Lemma 5.2, we can also derive the following theorem.
Theorem 5.3. Let u be an analytic function as defined in (5.1), and exponent b 1,κ and b 2,κ defined in Lemma 5.1 Then, there exists C > 0 such that following bounds hold: Proof. By recalling the relationship between degrees of freedom and polynomial order p for both the Q p and P p bases, we have Dof (Q p ) = (p + 1) d , (5. 12) and Then, (5.6) and (5.7) follow from Lemma 5.1.
For d = 2, 3, if the following condition 15) holds, then we have b 2,κ ≈ b 1,κ . It is easy to see that for small R κ or small mesh size h, the condition (5.15) will be satisfied. Moreover, we point out that an analytic function having sufficiently small R κ is equivalent to the function having an analytic continuation into a sufficiently large open neighbourhood ofκ, see [14] for details. Now, if we consider the error in terms of d √ Dof for the above bounds, the exponent for the exponential convergence rate of the P p basis and the S p basis are larger than the exponent for the Q p basis by a fixed factor of d √ d!. We have observed a steeper slope in error against d √ Dof for FEMs with S p basis and DGFEMs with P p basis. For d = 2, this suggests a typical ratio between convergence slopes of DGFEMs with P p and Q p basis, FEMs with S p and Q p basis, to be The numerical examples in Section 6 show that the ratio is slightly worse than the ideal ratio, but it is not far from the ideal ratio.

Numerical examples
We present some numerical examples to confirm the theoretical analysis in the previous sections. For simplicity of presentation, we use DGFEM(P) and DGFEM(Q) to denote the DGFEMs with local polynomial basis consisting of either P p or Q p polynomials and use FEM(S) and FEM(Q) to denote the FEMs with local polynomial basis consisting of either S p or Q p polynomials.
The comparisons are mainly made between the slope of FEM(S) and FEM(Q) over square meshes for d = 2 and hexahedral meshes for d = 3 under p-refinement. The slopes of the convergence lines are calculated by taking the average of the last two slopes of the line segments of each convergence line. We will also present an example comparing DGFEM(P) and DGFEM(Q). For more numerical examples for DGFEMs, see [15]. 6.1. Example 1. In the first example, we investigate the computational efficiency of DGFEM(P) and DGFEM(Q) schemes. To this end, we consider a partial differential equation with nonnegative characteristic form of mixed type. Let Ω = (−1, 1) 2 , and consider the PDE problem: with exact solution: This problem is hyperbolic in the region y ≤ 0 and parabolic for y > 0. In order to ensure continuity of the normal flux across y = 0, where the partial differential equation changes type, the exact solution has a discontinuity across the line y = 0, cf. [11,17]. By following [11], we use the symmetric interior penalty DGFEMs employing a special class of quadrilateral meshes for which the discontinuity in the exact solution lies on element interfaces. In this setting, we modify the discontinuity-penalization parameter σ, so that σ vanishes on edges which form part of the interface y = 0; this ensures that the (physical) discontinuity present in the exact solution is not penalized within by the numerical scheme. In this case, the exact solution is piecewise analytic on the two parts of the domain. In Figure 2, we observe that the DG-norm errors decays exponentially for both DGFEM(P) and DGFEM(Q) under p-refinement on both 64 and 4096 uniform square elements. Moreover, the slope of the convergence line for the DGFEM(P) is greater than the line of DGFEM(Q) in error against √ Dof . The ratio between the two slopes is about 1.39 on coarse meshes and fine meshes. 6.2. Example 2. In the second example, we investigate the computational efficiency of FEM(S) and FEM(Q) on standard tensor-product elements (quadrilaterals in 2D and hexahedra in 3D).
In this case, the exact solution is piecewise analytic on the domain. In Figure  3, we observe that the H 1 -seminorm errors decays exponentially for both FEM(S) and FEM(Q) under p-refinement on both 64 and 4096 uniform square elements. Again, we observe that the slope of the convergence line for the FEM(S) is greater than the line of FEM(Q) in error against √ Dof . The ratio between the two slopes is about 1.39 on coarse meshes and fine meshes.
In Figure 4, we observe that the H 1 -seminorm error decays exponentially for both FEM(S) and FEM(Q) under p-refinement on both 64 and 4096 uniform hexadhedral elements. Moreover, we observe that the slope of the convergence line for the FEM(S) is greater than the line of FEM(Q) in error against 3 √ Dof . The ratio   between the two slopes is about 1.62 on coarse meshes and slightly greater on fine meshes.
6.3. Example 3. In the third example, we investigate the convergence behaviour of the FEM(S) and FEM(Q) approaches for the Poisson problem on a non-smooth domain with fixed computational meshes under p-refinement. To this end, we let Ω be the L-shaped domain (−1, 1) 2 \ [0, 1) × (−1, 0]. Uniform square meshes consisting of 12 elements are used. Then, writing (r, ϕ) to denote the system of polar coordinates, we impose an appropriate inhomogeneous boundary condition for u so that u = r 2/3 sin(2ϕ/3); cf. [32]. We note that u is analytic in Ω \ {0}, but ∇u is singular at the origin; indeed, here u ∈ H 2 (Ω). This example reflects the typical (singular) behaviour that solutions of elliptic boundary value problems exhibit in the vicinity of reentrant corners in the computational domain.
In fact, u ∈ H 5 3 −ǫ (Ω), for any ǫ > 0. We investigate the convergence rate of the FEM(S) and FEM(Q) under p-refinement for this problem. In Table 1, we list the H 1 -seminorm error and also the convergence rate of FEM(S) and FEM(Q) with polynomial order p = 1, . . . , 60. We point out that due to the singularity at the origin, geometrically graded quadrature points towards the origin are used in order to get the desired accuracy (see [13] As we can see, the convergence rate in p for both FEM(S) and FEM(Q) are approximately O(p 4/3 ). The convergence rate in p is double the theoretical rate with respect to the Sobolev regularity of u. This is the doubling order convergence in the p-version finite element, see [8,30] for details. The reason of this doubling order convergence in p is related to the fact that standard Sobolev space can not optimally characterize the singularity of r γ log ν r type, γ ∈ R + , ν ∈ N; indeed from [6, 7, 5], we know that the modified Jacobi-weighted Besov spaces provide a sharper function space setting to characterize such singular functions. By using the results in [7], FEM(Q) has the following sharp error bound under p-refinement for this problem where constants C and c are independent of p. Moreover, we observe that the FEM(S) error is greater than FEM(Q) error by a factor about 2.29 for fixed p. By noting that 2 4/3 ≈ 2.52, we suppose that the error bound for FEM(S) satisfying The proof the above intuitive optimal p-version error bound for FEM(S) is beyond the scope of this work, which depends on the approximation theory for orthogonal projections onto the P p basis and the S p basis in the modified Jacobi-weighted Besov spaces. Now, let's consider the relation of the error against √ Dof . By using the asymptotic relation (5.12), we have the following relation for FEM(Q) |u − u h | H 1 (Ω) ≤ CDof − 2 3 . Then, we derive the following relation for FEM(S) by using the relation (5.14), Hence, both FEM(S) and FEM(Q) have the same algebraic convergence rate in degrees of freedom under p-refinement. However, the error bound of FEM(S) seems to be larger than the error bound of FEM(Q) by a constant. This result is observed in the Figure 5, where we can see that the convergence line of FEM(S) and FEM(Q) have the same slope in error against p (left) and error against Dof 1/2 (right). But the error of FEM(S) is always larger than the error of FEM(Q) in both graphs.

Conclusions and remarks
In this work, we derived new hp-optimal approximation results for the L 2orthogonal projection onto the total degree P p basis, and the H 1 -projection onto the serendipity basis S p over tensor product elements. With these results, we proved that the exponent of the exponential rate of convergence with respect to the number of degrees of freedom for the DGFEM employing P p basis and the serendipity FEM are greater than the exponent of the exponential rate of convergence for their counterparts employing Q p basis for analytic functions under p-refinement. Moreover, the exponent for the P p and S p bases are larger than that of Q p basis by a constant only depending on dimension. The sharpness of the theoretical results has been verified by numerical examples.
Finally, we remark on some applications and potential extensions of results in this work. First, we note that the hp-optimal error bounds for the L 2 -orthogonal projections onto the P p basis in the L 2 -norm can be used to improve the hperror bounds for mixed-FEMs employing the BDFM-elements, see [2]. Next, as we have already observed in the numerical example 7.2 in [12], the hp-adaptive DGFEM employing the P p basis gives a greater exponent of the exponential rate of convergence in terms of number of degrees of freedom than the exponent of DGFEM employing the Q p basis for solving the L-shaped domain problem. We are very interested in applying the results in this work for hp-version FEMs employing the serendipity basis and DGFEMs employing the P p basis in a general hp-refinement setting.