More Continuous Mass-Lumped Triangular Finite Elements

When solving the wave equation with finite elements, mass lumping allows for explicit time stepping, avoiding the cost of a lower-upper decomposition of the large sparse mass matrix. Mass lumping on the reference element amounts to numerical quadrature. The weights should be positive for stable time stepping and preserve numerical accuracy. The standard triangular polynomial elements, except for the linear element, do not have these properties. Accuracy can be preserved by augmenting them with higher-degree polynomials in the interior. This leaves the search for elements with positive weights, which were found up to degree 9 by various authors. The classic accuracy condition, however, is too restrictive. A sharper, less restrictive condition recently led to new mass-lumped tetrahedral elements up to degree 4. Compared to the known ones up to degree 3, they have less nodes and are computationally more efficient. The same criterion is applied here to the construction of triangular elements. For degrees 2 to 4, these turn out to be identical to the known ones. For degree 5, the number of nodes is the same as for the known element, but now there are infinitely many solutions. Some of these have a considerably larger stability limit for time stepping. For degree 6, two elements are found with less nodes than the known ones. For degree 7, one element with less nodes was found but with a negative weight, making it useless for time stepping with the wave equation. If the number of nodes is the same as for the classic element, there are now infinitely many solutions. Numerical tests for a homogeneous wave-propagation problem with a point source confirm the expected accuracy of the new elements. Some of them require less compute time than those obtained with the more restrictive accuracy criterion.


Introduction
The numerical solution of the wave equation in time involves a large sparse mass matrix. Mass lumping avoids the cost of its lower-upper decomposition, leaving a diagonal matrix that can be readily inverted. On the reference element, its entries are proportional to numerical quadrature weights. If applied to the regular higher-order polynomial elements, there is a zero weight for degree 2, or a negative one for degree 3. For higher degrees, the resulting spatial accuracy is too low for a second-order partial differential equation. The linear degree-1 element is the exception. For higher degrees, positive weights and sufficient spatial accuracy can be obtained by choosing the degree of the polynomials in the interior higher than on the element boundary. In this way, elements of degree 2 [1][2][3], 3 [4,5], 4 [6], 5 [7], 6 [8], 7 and 8 [9,10], and 9 [9] were found. The results for degrees 7 and 8 in [9] and [10] are identical, but for degree 9, the one in [10] has more nodes and degree 10 on the edges.
The generalization of this approach to tetrahedra led to an element of degree 2 [6] and two elements of degree 3 [6]. The number of additional nodes in 3D is rather large, compared to the standard tetrahedral element.
Recently, it was found that a sharper, less restrictive condition can be imposed to preserve spatial accuracy after mass lumping [11]. The resulting tetrahedral elements for degree 2 and 3 have less nodes and are more efficient than the older ones. A number of degree-4 elements were found as well.
Here, the same accuracy condition is used in 2D for the construction of triangular masslumped elements. In addition, the moment equations for exact quadrature are based on symmetric polynomials instead of bi-variate monomials modulo symmetry to ensure that the number of independent equations is the same as the number of polynomials. With the old accuracy criterion, this could be accomplished by choosing a suitable subset [7,12]. This enables quicker scanning through potential nodes patterns and elements that result in a number equations that does not exceed the number of unknowns. Though not strictly necessary, the last condition increases the chance of finding a solution.
The approach comprises the following steps. Given the polynomial degree of the element and a higher interior degree, one or more symmetric node patterns have to be found to support the polynomials. Next, a set of moment equations is formed by requiring numerical quadrature to be exact for elements of the Cartesian product of the basis polynomials and polynomials of the element degree minus two. This condition is less restrictive than the classic one that requires exactness for the element degree plus interior degree minus two. The resulting system is linear in the quadrature weights with coefficients that are symmetric polynomials in the node parameters. To increase the chance of finding a solution, the number of unknowns, consisting in weights and node parameters, is required to not exceed the number of independent equations. Among the solutions of the system, if any, only the real ones with strictly positive weights and node parameters that do not move nodes outside their symmetry class or the triangle are acceptable. From multiple solutions that correspond to the same set of nodes and weights modulo symmetry, only one is selected. Because the number of solutions grows quickly with the degree and system size, numerical root finding is required for the higher degrees.
When multiple acceptable solutions are found, which one is the best? The answer depends on the problem. When the wave equation is solved, the performance is affected by the number of nodes, the maximum allowable size of the time step and the resulting error. For elements of a given degree, on a rectangular domain with a decent mesh and material properties piecewise constant per element, the error behaves as element size to a power that is one higher than the degree, but the error constant varies among the multiple solutions, as does the Courant-Friedrichs-Lewy (CFL) number [13] that determines the maximum size of the time step.
To assess the performance of the various element, the numerical errors for a homogeneous problem with a point source were measured as a function of element size as well as run time. A comparison in terms of efficiency to other higher-order discretisation schemes on triangles, such as discontinuous Galerkin methods [14][15][16], the spectral difference method [17] and summation-by-parts operators [18], has not been considered. The last method avoids problems with the unisolvency condition, which will be encountered here for some choices of nodes patterns.

Node Sets and Basis Functions
A point on the triangle with vertices x A , x B and x C is represented by Given a single node with natural coordinates (x 1 , x 2 ), its symmetric counterparts on the triangle and itself are elements of the set (x 1 , x 2 ), which can be obtained by taking the first two of each of the 6 permutations of (x 0 , x 1 , x 2 ) and removing duplicates. Table 1 lists the 6 equivalence classes for symmetric sets of nodes on the triangle. The second column contains the symmetry class, following definition 2.3 of [19], which will be referred to by the index j of the first column in what follows. The third column describes the node position on the triangle in natural coordinates modulo symmetry, and the fourth column their number v j based on symmetry.
A node pattern or rule set is given by K = {K 1 , K 2 , . . . , K 6 }. The total number of nodes per triangle is n p = 6 j=1 K j v j . The number of node parameters is n xpar = K 3 + K 5 + 2K 6 . Together with the quadrature weights, of which there are n weight = 6 j=1 K j , the total number of parameters becomes n par = n xpar + n weight .
The polynomial set of basis function of degree p is Since this choice leads to non-positive quadrature weights already for degree 2, it can be augmented by higher-degree polynomials in the interior that vanish on the edges: x 1 x 2 that vanishes on the edges. Here, p ≥ 3 and p ≥ p.
Instead of P p = {η}⊗ P p −3 , the product of the bubble function η and some subsetP p −3 of P p −3 can be chosen, with P p−3 ⊆P p −3 ⊆ P p −3 , similar to the tetrahedral elements of The number of nodes per class is v j degree 4 in [11]. This option will not be considered here, because it is less obvious how to automate the search for new elements, and is left for further research. Conformity between elements is obtained if the restriction to the edges has degree p. A necessary condition for unisolvence is that there are as many nodes as polynomial basis functions n pol . Together, they define the node pattern for given degrees p and p , as described by Theorem 1 in Appendix C.
Conjecture 3.1 in [10] states another requirement for unisolvence: the node pattern in the interior should be the same as that of the regular element of degree p with equidistant nodes in the natural coordinates. Lemma 1 in Appendix C proves its consistency with Theorem 1.

Quadrature
Lumping of the mass matrix amounts to numerical quadrature. The usual requirement is that polynomials up to degree q = p + p − 2 should be integrated exactly for accuracy to be preserved [5, 20, e.g.]. The elements up to degree 9 mentioned in the introduction are all based on this criterion. A less restrictive condition was introduced in [11] and enabled the construction of tetrahedral elements with less nodes. Quadrature should be exact for polynomials in Q p, p = P p−2 ⊗ U , where U = P p ⊕ P p contains the basis functions. If ψ(x 1 , x 2 ) is an element of Q p, p , the condition becomes 1 Here k runs over the nodes (x k,1 ,x k,2 ) modulo symmetry. Since U is a subset of P p , this potentially leads to elements with less nodes. The requirement of exact quadrature results in a polynomial system of equations, linear in the weights w k , with coefficients that are rational numbers for the polynomials used here. It has a subsystem that corresponds to polynomials of P p and the interior nodes.
To avoid inconsistent systems, it is natural to require the number of equations n eq not to exceed the number of parameters n par , although it may still happen that the overdetermined system has one or more solutions. Among the solutions, only the real ones with strictly positive weights w k and nodes not outside the triangle are useful. In addition, degeneracies where a node moves out of its class into another should be avoided. A class 3 node should therefore have 0 < a < 1 2 , a class 5 node 0 < a < 1 2 and a = 1 3 , and a class 6 node a = b, The number of equations n eq tends to be less than the number of bivariate monomials modulo symmetry in Q p, p = P p−2 ⊗ U , because some of them are integrated to zero. In the earlier approach, which required exactness up to degree q = p + p − 2, only the monomials x i 0 0 x i 1 1 x i 2 2 with i 2 = i 1 ≥ i 0 ≥ 0 and 2 j=0 i j = i 0 + 2i 1 ≤ q had to be considered [7,12]. In the current setting, it is more convenient to work with symmetric polynomials, c.f. [21].
A polynomial k in equation (1) can be expressed in the elementary symmetric polynomials {1, x 0 + x 1 + x 2 , ξ, η}, with ξ(x 1 , x 2 ) = x 0 x 1 + x 0 x 2 + x 1 x 2 and bubble function η(x 1 , x 2 ) = x 0 x 1 x 2 . Because x 0 + x 1 + x 2 = 1, the second polynomial drops out. The set of symmetric polynomials up to degree p is P For quadrature, the set Q p, p can be replaced by Equation (1) for an element ψ of Q p, p can be replaced by 1 where j(k) is the class of node number k modulo symmetry, i.e., for each symmetric set of nodes, only one is included. The number of independent polynomials in P sym p is n sym p = 1 + floor ( p( p + 6)/12) [21]. Their number in Q p, p is the same as the resulting number of equations: n eq = n sym 2 p−2 + n sym p+ p −5 − n sym 2 p−5 , assuming p > p ≥ 2. The use of symmetric polynomials suggests that (a, 0) be better replaced by a = 1 2 (1 − √ 1 − α) and 0 ≤ α ≤ 1, as in [8]. A similar substitution for a class 6 node (a, b) is Appendix D describes this parametrization and its inverse, which proved to be useful for the element of degree 5. It lowers the degrees in the system of polynomial equations but may be less attractive for numerical root finding with bounds.

Polynomial System
Given a polynomial degree p, the values of K 1,2,3 of the node pattern are prescribed by Theorem 1 in Appendix C. Because the vertices are always included, K 1 = 1. Then, a loop over the interior degree p is started, defining the remaining K 4,5,6 by Theorem 1, to find combinations with a number of free parameters n par not exceeding the number of independent equations n eq . The search is limited to values of the total number of nodes n p not much larger than that for the known elements, constructed with the more restrictive accuracy criterion.
For each candidate node pattern and degrees p, p , equation (3) provides a system of equations of the form where the vector w contains the n weight integration weights and the vector a the n xpar node parameters. The matrix A(a) with n eq rows and n weight + 1 columns contains polynomials in the node parameters up to degree p + p − 2 and rational coefficients. A subsystem corresponding to the product of the bubble function and polynomials only contains the interior weights and node parameters.
The system may have no solution, which typically but not necessarily happens for n eq > n xpar . It may be underdetermined and then either has no solution, if inconsistent, or infinitely many. This happens typically but not necessarily if n eq < n xpar . If the number of complex solutions is finite, the system is called zero-dimensional. In that case, the number of solutions does not exceed the Bézout bound [22, e.g.], which grows rapidly with the degree and size of the system. Among those solutions, only the real ones with positive weights and node parameters within the given bounds are useful, if they exist at all.
There are several packages for solving polynomial systems of equations but their unfavourable computational complexity limits their usefulness to small systems. For the results presented later on, the function Gröbnerbasis in Mathematica [23], an implementation of the Buchberger's algorithm [24], was nevertheless used with a time limit because it returns relatively quickly if the system is inconsistent. For small systems, the function Reduce provided solution sets, from which the real-valued ones within the parameter bounds were selected, if present, and only one for each set of symmetric solutions was retained. For larger systems with p ≥ 5, an empirical approach was adopted with a hand-coded damped Newton-type algorithm using extended numerical precision in Mathematica, starting from randomly chosen initial values. Better results were obtained with starting values obtained with lsqnonlin in Matlab [25], a nonlinear least-squares algorithm with bounds using a trust-region method [26], and refining them with the Newton-type algorithm in Mathematica. The authors of [27,28] take a similar approach with additional refinements. In some cases, this provided convergence with an extended precision of 64 or even 128 digits. In other cases, the Matlab solution proved to be false, due to its insufficient precision. Table 2 lists candidate node patterns for a small number of nodes n p and a number of parameters n par not too much larger than the number of equations n eq .

New Elements
For degrees 2 to 4, the elements obtained with the less restrictive accuracy criterion are the same as the known ones. New elements are found for degree 5 to 7.
The degree-5 element has the same node pattern and number of nodes as the known element but, with the less restrictive accuracy criterion, the number of equations is now one less than the number of parameters and infinitely many solutions exist. Some of the elements listed in Table 3 have a much better CFL number than the old element.
The value the CFL number in the table is a crude estimate computed on a single reference element with natural boundary conditions, as in [8]. Note that for the performance comparison later on, the power method is used. Figure 1 depicts the node distributions. Versions 'A' to 'E' were found by numerical root finding. As it turns out, the node parameters and weights could actually be obtained as functions of a single parameter related to the class-6 node. The other parameter for that node is given by a root of a quartic equation. Two of the four roots provide acceptable solutions, each for a certain range of the one parameter. From the two parameters, the other unknowns follow. Appendix E describes the details. With this approach, the elements 'F' and 'G' in Table 3 were found.  Table 6, cf. [9,10] The first columns contains the element's polynomial degree p on the edges, the second the degree p in the interior. Acceptable node patterns K are listed in the third column. The following three columns contain the number of nodes per element, n p , the number of equations n eq and the number of parameter n par including integration weights. The last column contains references if the element is known    Table 4 lists two new elements for p = 8 and K = {1, 1, 2, 0, 3, 2}, one with a much larger estimated CFL than the other. Figure 2 displays their node patterns. They have 39 nodes, significantly less than the 46 for the elements found in [8]. Note that the related polynomial system of equations may have more acceptable solutions than the two found so far. Table 5 contains a new degree-7 element with less nodes than the known element, but with a negative weight, making it unsuited for explicit time stepping with the wave equation. Figure 3 shows the distribution of nodes. Table 6 lists a new element for degree 7 with version name 'old,2', obtained with the old accuracy criterion. It has an unfavourable estimated CFL number compared to the 0.0288 for the degree-7 element in [9,10], not included in the table.
For the same node pattern, the new accuracy criterion provided the elements 'new,A', 'new,B' and 'new,C', among others. Note that the elements obtained with the old accuracy criterion also obey the new criterion if the same node pattern is used, but with the latter the number of equations is less than the number of parameters. There are infinitely many solutions, similar to the elements of degree 5. Figure 4 displays the node distributions for various degree-7 elements.
The node pattern for these elements results in a polynomial system that has one equation less than the number of unknowns. Once an element is found with a Newton-type method, the Jacobian of the polynomial system has a nullspace characterized by a single vector. A small  perturbation in its direction followed by a few Newton iterations will provide a new element. With this continuation approach, one can, for instance, minimize or maximize a parameter. Note that a larger value of the smallest weight, typically the first one that corresponds to the vertices, as suggested by [10], does not necessarily imply a larger CFL number, as can be seen in Table 3 when comparing versions 'A' and 'B', for instance.

Numerical Tests
Accuracy and performance tests were carried out for the wave equation with solution p(t, x), wave speed c(x), density ρ(x) and forcing function f (t, x) = w(t)δ(x− x s ), representing a point source located at x s with signature w(t). The same homogeneous  The element of [9,10], not listed, has a CFL of 0.0288. Element version '2' obeys the old accuracy criterion, 'A' to 'C' the new test problem as in [8] is used, on the unit square for constant c(x) = 1 and ρ(x) = 1 and with zero Dirichlet boundary conditions (not Neumann as erroneously mentioned in [8]). The source is located at the centre and has a signature w(t) 16 for 0 ≤ t ≤ T and zero otherwise, with a duration T = 0.2. The spatial discretisation is reviewed in Appendix A. Geompack [30] was used to construct unstructured meshes. Assembly is performed on the fly in each time step, as in [6,11,[31][32][33], although on modern hardware, a pre-assembled matrix might be more efficient in 2D, in particular for lower orders.
Higher-order time stepping [34][35][36][37] is applied instead of Stork's dispersion correction [38][39][40][41], which would be more efficient in practice. The time-stepping order is taken to be M t = 2 ceil(( p + 1)/2). Details, such as the increase of the CFL number for higher orders, can be found in [8, e.g.]. If the signature is not smooth enough at the endpoints, at time 0 or T , the numerical error for high-order time stepping may become too large because a higher derivative of the source signature ceases to exist. A higher power will avoid that problem and is chosen here as 16 instead of the power 8 used [8]. Three new and two old elements of degree 7 with 57 nodes. Version '7,1 old' (c) was found earlier [9,10]. The newly found '7,2 old' has an unfavourable CFL number For second-order times stepping, the time step is limited by t max = √ 2/σ max , where σ max is the largest eigenvalue of the spatial operator, the product of the inverse lumped mass matrix and the stiffness operator. The power method [42] was used to find an estimate for each run. The results turned out be slightly larger than the crude estimates listed in the tables.
The numerical solution at time t = 1.25 is compared to the exact solution on all the nodes. Their number equals the number of degrees of freedom N , which includes the zero Dirichlet boundary values. The relative L 2 error is defined as u − u exact 2 / u exact 2 , with a norm given by where j runs over all triangle and k over all nodes in each triangle with area a j , quadrature weights w k and solution values u j,k . With a number of degrees of freedom N , the expected convergence behaviour is N −( p+1)/2 for the current problem. Note that the use of these weights in equation (6) results in an additional error, which should be neglegible for p + p − 2 > p or p > 2.
The computational efficiency of an element is not only determined by the number of nodes, but also by its CFL number and the error constant. A crude indication of the efficiency is obtained by performing tests on the given problem and recording its run time. The singlethreaded code was written in C and dates back to 1995. The wall clock time for the time stepping part of the code is by no means representative for what can be obtained on modern architectures, but is nevertheless used as a measure for the performance comparison. Figure 5a shows the measured relative L 2 error on a sequence of unstructured grids for three versions of the scheme, including the old one, as a function of the square root of number of degrees of freedom. The trend follows the expected behaviour, N −( p+1)/2 , indicated by the grey line. Of the three version in this comparison, 'B' has the smallest error. Figure 5b plots the error against the elapsed time and provides a rough indication of the relative performance. Versions 'B' and 'G' appear to be slightly more efficient than the other, and both do better than the old scheme. Although version 'G' has a smaller CFL number than 'F', its better accuracy make its more efficient. Figure 5 The performance for the new elements is similar, whereas the gain over the old element is small. Figure 6a confirms the expected convergence behaviour for degree 6. Both new elements 'A' and 'B' have a smaller error than the known scheme with 46 nodes, version 'D', in [8]. In terms of efficiency, Fig. 6b shows that version 'A' outperforms the others. Version 'B' is less efficient than the old one. Although it has less nodes per element, its smaller CFL number make it less attractive. The results in the relative maximum norm, displayed in Fig. 6(c) and (d), show a similar behaviour. Figure 7 display errors for two degree-7 elements, version '1' of [9,10] and '2' obtained with the old, more restrictive accuracy criterion. Elements 'A','B' and 'C' were found with the new criterion. Their errors are similar, but 'C' is more efficient. Versions '1' from [9,10] and '2' were obtained with the old, more restrictive accuracy criterion, whereas 'A' to 'C' were found with the new criterion

Conclusions
The search for continuous mass-lumped triangular finite elements with a less restrictive accuracy criterion than the classic one has led to a number of new elements of degree 5, 6 and 7. These elements are enriched with polynomials of a higher degree in their interior to preserve accuracy after mass lumping. Given the polynomial degrees on the boundary of the triangle and in its interior, it is proven that the requirements of conformity and unisolvence define a unique node pattern. The elements for lower degrees turned out to be same as the known ones. For degree 5 and 7, there are infinitely many elements with the same number of nodes as the known element. For degree 6, two new elements with less nodes were found.
Numerical tests on a homogeneous wave-equation problem with a point source at the centre show that some of the degree-5 elements have a somewhat better accuracy at a given cost than the classic element. The same is true for one of the new elements of degree 7. The gain is most pronounced for one of the two degree-6 elements.
If r = 0 and t > T max , let τ = exp(s). Then, The integrals can be evaluated by numerical quadrature. Alternatively, the solution in the Fourier domain with a Hankel function can be convolved with the source signature.

Appendix C Node Pattern for Given Degrees
For a degree p ≥ 1 on the edges of the triangle and a degree p ≥ p in its interior, a node pattern K = {K 1 , K 2 , . . . , K 6 } should be chosen that uniquely determines the associated polynomials.
Theorem 1 Given a degree p ≥ 1 on the edges of a triangle and a degree p ≥ p in its interior. Assume that the vertices are always involved: K 1 = 1. A necessary condition for conformity and unisolvence is a node pattern with K 2 = 1 and K 3 = ( p − 2)/2 if p is even, or K 2 = 0 and K 3 = ( p − 1)/2 if p is odd. The centroid is involved, i.e., K 4 = 1, if mod 3 p = 0; otherwise, K 4 = 0. The remaining two classes have K 5 = floor(( p − 1)/2) − K 4 and Proof Conformity is obtained if the restriction to the edges has degree p. This implies that The number of points on the boundary thereby becomes n edge = 3 p. The number of remaining interior points is 1 , constituting a subset of the n int points for the degree p ≥ p.
A necessary condition for unisolvence is that the number of polynomial basis functions n pol equals the number of nodes n p = n edge + n int . The first three entries of the node pattern have already been discussed. For the interior, the remaining three should obey where K 4 is 0 or 1. The Lagrange basis functions are 1 on one node and 0 on all the other. For a reference node in an equivalence class j with barycentric coordinates (x 0 , its symmetric counterparts can be obtained by taking the unique permutations of the three coordinates, providing a total of v j nodes. The corresponding Lagrange basis functions φ(x 0 , x 1 , x 2 ) for these nodes follow the same permutation pattern, that is, if the one for the reference nodes is know, the other can be obtained by the permuting its arguments, discarding duplicates that occur if the same permutation on the reference node produces a duplicate.
In this way, the degrees p and p define a unique node pattern K .
According to Conjecture 3.1 in [10], unisolvence requires the node pattern in the interior to be the same as that of the regular element of degree p , having equidistant nodes in the natural coordinates. The authors employ orbit patterns of the form (K 4 ) : (K 1 + K 2 + K 5 ) : (K 3 + K 6 ). Lemma 1 Theorem 1 is consistent with Conjecture 3.1 in [10].
The resulting node pattern is the same as in Theorem 1.
The other node parameters and weights follow from explicit but lengthy expressions, except for .
From the resulting mass and stiffness matrices, the CFL number can be estimated by considering a single reference element with natural boundary conditions. For root number 3, the maximum of 0.0779 is found for α = 0.789331970591268894545679 and other parameters listed in Table 3 as version F. Version G corresponds to the largest CFL estimate of 0.0661 for root number 2, based on α = 0.789331970591268894545679. A supplemental file degree5.m for Mathematica is provided with a function that generates the node parameters and weights for a given value of α and a root number. The solutions listed in Table 3 are included. In the script, the 3 node parameters for class 5 were replaced by a symmetric polynomials to further simplify the system of polynomial equations.