1 Introduction

The application of high-order numerical methods continues to expand, now spanning aeronautics (e.g., [13]) to biomedical engineering (e.g., [3]), in large part due to two factors: the numerical accuracy combined with low dissipation and dispersion errors they can obtain for certain problem classes [7], and the computational efficiency they can achieve when balancing approximation power against computational density [14]. Most high-order finite element, also called spectral/hp element, simulation codes mimic their traditional finite element counterparts in terms of software organization such as the development of local operators that are then “assembled” (either in the strict continuous Galerkin sense or in the weak discontinuous Galerkin sense) into a global system that is advanced in some way [5, 6, 10]. Tremendous effort has been expended to create optimized elemental operations that evaluate, for instance, the solution expansions at the quadrature points which allows for rapid evaluation of the solution at the points of integration needed for computing the stiffness matrix entries, the forcing terms, and other desirable quantities.

However, in the case of history points (positions in the field at which one wants to track a particular quantity of interest over time) [16], pathlines/streamlines [17], isosurface evaluation, [9] or refinement and mortaring [12], evaluation of solution expansions at arbitrary point locations is required. From the perspective of point evaluation over the entire domain, these operations require two phases: given a point in the domain, first finding the element in which that point resides, and then a fast evaluation of the solution expansion on an individual element. Optimization strategies such as octrees and R-trees have been implemented to accelerate the first of these two tasks (e.g., [8]); however, a concerted effort has not been placed on generalized elemental operations such as arbitrary point evaluation.

The purpose of this work is to address the need for efficient arbitrary point evaluation in high-order (spectral/hp) element methods. We are building upon the barycentric polynomial interpolation work of Berrut and Trefethen [1], which has been shown to have a myriad of applications within the polynomial approximation world [11]. In this work, we mathematically generalize barycentric polynomial interpolation to arbitrary simplicial elements, and then demonstrate the effectiveness of this approach on the canonical high-order finite elements shapes in 2D (triangles and quadrilaterals) and 3D (tetrahedra, prisms, pyramids, and hexahedra). The operational efficiency range of our work is designed for polynomial degrees often run within the spectral/hp community – from polynomial degree 2 to 10. The algorithms presented herein are also implemented in the publicly available high-order finite element library Nektar++ [2, 15].

The paper proceeds as follows: In Sects. 2 and 3, we lay out the mathematical details of our work. In Sect. 2, we highlight the one-dimensional building blocks of barycentric interpolation and provide the mathematical framework for considering interpolation over finite element shapes generated through successive application of Duffy transformations of an orthotope element, and then in Sect. 3 we provide details on expanding these ideas to both tensor-product constructed 2D and 3D elements as used in Nektar++. In Sect. 5, we present both algorithmic and implementation details. We provide details that facilitate reproducibility of our results as well as complexity and storage analysis of the proposed strategy. In Sect. 6, we present various test cases that demonstrate the efficacy and efficiency of our proposed work, as well as a capstone example that highlights a real-world application of our strategy. We summarize our contributions and conclude with possible future work in Sect. 7.

2 Univariate Barycentric Interpolation

Berrut and Trefethen [1] propose barycentric Lagrange interpolation for stable, efficient evaluation of polynomial interpolants. With \(\eta \in [-1,1]\) the independent variable, let \(\{z_j\}_{j=0}^k \subset [-1,1]\) denote \(k+1\) unique nodes. Defining \(Q_k\) as the space of polynomials of degree k or less in one variable, then any \(p \in Q_k\) can be represented in its barycentric form,

$$\begin{aligned} p(\eta )&= \frac{\sum \limits _{j=0}^k\frac{ w_j p_j }{(\eta - z_j)}}{\sum \limits _{j=0}^k\frac{w_j}{(\eta -z_j)}} =:\frac{N(\eta )}{D(\eta )},&p_j&:=p(z_j), \end{aligned}$$
(1)

where the weights \(w_j\) are given by

$$\begin{aligned} w_j = \frac{1}{\prod \limits _{{i = 0,i\ne j}}^{k} (z_i - z_j)} \forall j = 0,1,\cdots ,k. \end{aligned}$$
(2)

The weights are independent of the polynomial p, and depend only on the nodal configuration. Barycentric form (1) reveals that given the values \(\{p_j\}_{j=0}^k\) of a polynomial, then evaluation of p at an arbitrary location \(\eta \) can be accomplished without solving linear systems or evaluating cardinal Lagrange interpolants.

In this paper, we focus on the algorithmic advantages of using barycentric form, with proper extensions to some standard multivariate non-tensorial domains that are popular in high-order (spectral/hp) finite element methods. These algorithmic advantages stem from the univariate algorithmic advantages: The map \(\eta \mapsto p(\eta )\) using the formula (1) requires \({\mathcal {O}}(k)\) arithmetic operations, whereas the same map using standard linear expansions frequently requires \({\mathcal {O}}(k^2)\) operations.

Throughout our discussion, we consider the nodes \(z_j\), and subsequently, through (2), the barycentric weights \(w_j\), as given and fixed. To emphasize the \({\mathcal {O}}(k)\) complexity of the operations, which we consider in the following section, we define

$$\begin{aligned} S_r(\varvec{v}, \eta )&:=\sum _{j=0}^k \frac{v_j w_j}{(\eta - z_j)^r},&\varvec{v}&= \left( v_0, \ldots , v_k \right) ^T, \end{aligned}$$
(3)

which, given \(r, \varvec{v}\), ostensibly requires only \({\mathcal {O}}(k)\) complexity to evaluate \(\eta \mapsto S_r(\varvec{v},\eta )\). Note in particular that \(S_r(\varvec{v},\eta )\) is linear in \(\varvec{v}\) and that

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}\eta } S_r(\varvec{v},\eta ) = -r S_{r+1}(\varvec{v},\eta ). \end{aligned}$$
(4)

We can write (1) as

$$\begin{aligned} p(\eta )&= \frac{S_1(\varvec{p}, \eta )}{S_1(\varvec{1}, \eta )},&\varvec{p}&= \left( p_0, \ldots , p_k\right) ^T,&\varvec{1}&:=\left( 1, \ldots , 1\right) ^T \in \mathbb {R}^n, \end{aligned}$$
(5)

which clearly demonstrates the \({\mathcal {O}}(k)\) complexity if \(\varvec{p}\) is furnished.

2.1 Alternatives to Barycentric Form

Consider a linear expansion representation of \(p \in Q_k\), written in terms of either an (arbitrary) basis \(\{\phi _j\}_{j=0}^k\) of \(Q_k\), or the cardinal Lagrange functions of the interpolation points \(\{z_j\}_{j=0}^k\):

$$\begin{aligned} p(\eta )&= \sum _{j=0}^k c_j \phi _j(\eta ),&\mathrm {span}\{\phi _0, \ldots \phi _k \}&= Q_k, \end{aligned}$$
(6)

where the \(\phi _j\) basis functions can be, e.g., cardinal Lagrange interpolants, monomials, or orthogonal polynomials:

$$\begin{aligned} \phi _j(\eta )&= \prod _{\begin{array}{c} i = 0, \ldots , k\\ i \ne j \end{array}} \frac{\eta - z_i}{z_j - z_i}, \end{aligned}$$
(7a)
$$\begin{aligned} \phi _j(\eta )&= \eta ^{j-1}, \end{aligned}$$
(7b)
$$\begin{aligned} \phi _j(\eta )&= \psi _j(\eta ), \end{aligned}$$
(7c)

with \(\psi _j\) any orthogonal polynomial family, such as Legendre or Chebyshev polynomials. Our main focus is the computational complexity of evaluating a polynomial interpolant given the data values \(\{p_j\}_{j=0}^k\). The following diagram summarizes the complexity of utilizing each of these procedures to accomplish the evaluation \(\eta \mapsto p(\eta )\):

$$\begin{aligned} \begin{array}{rccccl} \text {Barycentric (1}: &{} \{(z_j)\}_{j=0}^k, \eta &{} \xrightarrow {{\mathcal {O}}(k^2)}&{} \{(w_j,z_j,p_j)\}_{j=0}^k, \eta &{} \xrightarrow {{\mathcal {O}}(k)} &{} p(\eta ) \\ \text {Monomial (6), (7b)}: &{} \{(z_j,p_j)\}_{j=0}^k, \eta &{} \xrightarrow {{\mathcal {O}}(k^3)} &{} \{(c_j)\}_{j=0}^k, \eta &{} \xrightarrow {\begin{array}{c} \text {Naive: }{\mathcal {O}}(k^2)\\ \text {Horner: }{\mathcal {O}}(k) \end{array}} &{} p(\eta ) \\ \text {Orth. Poly. (6), (7c)}: &{} \{(z_j,p_j)\}_{j=0}^k, \eta &{} \xrightarrow {{\mathcal {O}}(k^3)} &{} \{(c_j)\}_{j=0}^k, \eta &{} \xrightarrow {\begin{array}{c} \text {Naive: }{\mathcal {O}}(k^2)\\ \text {Clenshaw: }{\mathcal {O}}(k) \end{array}} &{} p(\eta ) \\ \text {Lagrange (6), (7a)}: &{} &{} &{} \{(z_j, p_j)\}_{j=0}^k, \eta &{} \xrightarrow {{\mathcal {O}}(k^2)} &{} p(\eta ) \\ \end{array} \end{aligned}$$
(8)

Above, we have alluded to the fact that direct evaluation of monomial or orthogonal polynomial \((k+1)\)-term expansions appears to require \({\mathcal {O}}(k^2)\) complexity, but clever rearrangement of elements in the summation can lower this to \({\mathcal {O}}(k)\) in both cases, using either Horner’s algorithm (monomials) or Clenshaw’s algorithm [4] (orthogonal polynomials). Viewed in this way, there are two advantages to utilizing the barycentric form. The initial one-time computation for the barycentric weights is cheaper than for the monomials. Furthermore, with fixed nodal locations \(z_j\), the weights \(w_j\) need not be recomputed if p is changed; in other words, the weights \(w_j\) do not depend on p. Hence if k and the nodes \(z_j\) are given and fixed, then the weights \(w_j\) can be precomputed and (re-)used for any \(p \in Q_k\).

Our first task in this paper is to generalize the barycentric procedure to first- and second-derivative.

2.2 Barycentric Evaluation of Derivatives

A formula for the derivative of a polynomial can be derived from the barycentric form (1) that inherits its advantages. We first note two auxiliary expressions for the derivative of the numerator and denominator rational functions,

$$\begin{aligned} N'(\eta )&= -S_2(\varvec{p}, \eta ),&D'(\eta )&= -S_2(\varvec{1}, \eta ), \end{aligned}$$
(9)

each of which ostensibly also requires only \({\mathcal {O}}(k)\) complexity to evaluate if the weights \(w_j\) are precomputed. Then, by directly differentiating (1), we have

$$\begin{aligned} p'(\eta ) = \frac{N'(\eta ) - p(\eta ) D'(\eta )}{D(\eta )} = \frac{S_2\left( p(\eta ) - \varvec{p}, \eta \right) }{S_1(\varvec{1},\eta )}, \end{aligned}$$
(10)

where the vector \(p(\eta ) - \varvec{p}\) has entries \(p(\eta ) - p_j\). Therefore, this “barycentric" form for \(p'(\eta )\) requires (i) an evaluation of \(p(\eta )\) that can be accomplished in \({\mathcal {O}}(k)\) complexity using (1), and (ii) an additional \({\mathcal {O}}(k)\) evaluation of the summation above. Thus, \(\eta \mapsto p'(\eta )\) can be evaluated with only \({\mathcal {O}}(k)\) complexity.

A similar computation shows that

$$\begin{aligned} p''(\eta ) = \frac{2}{S_1(\varvec{1},\eta )} \left( p'(\eta ) S_2(\varvec{1},\eta ) - S_3(p(\eta ) - \varvec{p}, \eta ) \right) . \end{aligned}$$
(11)

Again, since \(\eta \mapsto p(\eta )\) and \(\eta \mapsto p'(\eta )\) can be evaluated with \({\mathcal {O}}(k)\) effort through (1) and (10), some extra \({\mathcal {O}}(k)\) effort to evaluate \(S_2\) and \(S_3\) above yields an evaluation \(\eta \mapsto p''(\eta )\) that can also be accomplished in \({\mathcal {O}}(k)\) time.

3 Tensorization of the Barycentric Form

All the evaluations considered above can be generalized to tensorial formulations, which is the main topic of this section. To that end, we introduce some multidimensional notations. Let \(\varvec{\eta } :=(\eta _1, \ldots , \eta _d) \in [1,1]^d\) be a d-dimensional vector. Given some multi-index \(\varvec{k} \in \mathbb {N}_0^d\), we introduce the tensorial space of polynomials \(Q_{\varvec{k}}\) defined by \(\varvec{k}\):

$$\begin{aligned} Q_{\varvec{k}} :=\mathrm {span} \left\{ \varvec{\eta }^{\varvec{j}} \;\big |\; \varvec{j} \in \mathbb {N}_0^d \text { and } \varvec{j} \le \varvec{k} \right\} , \end{aligned}$$
(12)

where we have adopted the standard multi-index notation,

$$\begin{aligned} \varvec{\eta }^{\varvec{j}}&:=\prod _{q=1}^d \eta _q^{j_q},&\varvec{j}&= (j_1, \ldots , j_d) \in \mathbb {N}_0^d, \end{aligned}$$
(13)

and \(\varvec{j} \le \varvec{k}\) is true if all the component-wise inequalities are true.

Fixing \(\varvec{k}\), we consider the case of representing an element p of \(Q_{\varvec{k}}\) through its values on a discrete tensorial grid of size \(\prod _{q=1}^d k_q\). Like the univariate case, linear expansions in cardinal Lagrange, monomial, and/or orthogonal polynomials are common, but we will exercise the barycentric form. Let a tensorial grid on a \([-1,1]^d\) orthotope be given, with \(k_q+1\) points in direction q:

$$\begin{aligned} \left\{ z_{j,q}\right\} _{j=0}^{k_q} \subset [-1,1], \end{aligned}$$
(14)

for \(q = 1, \ldots , d\). The tensorization of these grids results in the multidimensional grid \(\left\{ \varvec{\eta }_{\varvec{j}} \right\} _{\varvec{j} \le \varvec{k}}\) defined by

$$\begin{aligned} \varvec{z}_{\varvec{j}} :=\left( z_{j_1, 1}, \, z_{j_2, 2}, \, \ldots , \, z_{j_d,d} \right) \in [-1,1]^d. \end{aligned}$$
(15)

Given this tensorial configuration of nodes, we define univariate barycentric weights associated with each dimension in a fashion similar to (2),

$$\begin{aligned} w_{q,\ell }&:=\frac{1}{\prod \limits _{{i = 0,i\ne j}}^{k} (z_{i,q} - z_{j,q})},&0&\le j \le k_q,&1&\le q \le d. \end{aligned}$$
(16)

Then, given the data

$$\begin{aligned} \left\{ p_{\varvec{j}} \right\} _{\varvec{j} \le \varvec{k}}, p_{\varvec{j}} :=p\left( \varvec{z}_{\varvec{j}}\right) , \end{aligned}$$
(17)

for \(p \in Q_{\varvec{k}}\), then the multidimensional barycentric form of p is

$$\begin{aligned} p(\varvec{\eta })&= \frac{\sum _{\varvec{j} \le \varvec{k}} \frac{p_{\varvec{j}} w_{\varvec{j}}}{\odot \left( \varvec{\eta } - \varvec{z}_{\varvec{j}}\right) }}{\sum _{\varvec{j} \le \varvec{k}} \frac{w_{\varvec{j}}}{\odot \left( \varvec{\eta } - \varvec{z}_{\varvec{j}}\right) }},&\odot \left( \varvec{\eta } - \varvec{z}_{\varvec{j}} \right)&:=\prod _{q=1}^d \left( \eta _q - z_{j_q, q} \right) ,&w_{\varvec{j}}&:=\prod _{q=1}^d w_{q,j_q}. \end{aligned}$$
(18)

Given \(\varvec{y}\in [-1,1]\), an evaluation of \(p(\varvec{y})\) above requires \({\mathcal {O}}\left( \prod _{q=1}^d k_q \right) \) operations, corresponding to the complexity of the summations.

3.1 Dimension-by-Dimension Approach

Instead of the direct approach (18) for evaluating p, we utilize a dimension-by-dimension computation that is slightly more computationally expensive but allows us to directly leverage univariate procedures, greatly simplifying the software implementation. First, consider the functions \({\widetilde{p}}_q\), for \(q = d-1, \ldots , 1\), each of which is a function of \(\eta _1, \ldots , \eta _q\) formed by freezing \(\eta _{s} = y_{s}\) for \(s > q\):

$$\begin{aligned} {\widetilde{p}}_d&:=p,&{\widetilde{p}}_q(\eta _1, \ldots , \eta _q)&:=p\left( \eta _1, \ldots , \eta _q, y_{q+1}, \ldots , y_d \right) = {\widetilde{p}}_{q+1}(\eta _1, \ldots , \eta _q, y_{q+1}). \end{aligned}$$
(19)

In order to evaluate \(p(\varvec{y})\), we will proceed by iteratively constructing \({\widetilde{p}}_q\) from \({\widetilde{p}}_{q+1}\) for \(q = d-1, \ldots , 1\). Via barycentric form, “constructing" \({\widetilde{p}}_q\) amounts to evaluating this function on the q-dimensional tensorial grid

$$\begin{aligned} \widetilde{\varvec{z}}_{q,\varvec{j}} = \left( z_{j_1, 1}, \, z_{j_2, 2}, \, \ldots , \, z_{j_q,q} \right) \in [-1,1]^q. \end{aligned}$$
(20)

Then, for example, to first generate \({\widetilde{p}}_{d-1}\), we must evaluate

$$\begin{aligned} {\widetilde{p}}_{d-1}\left( \widetilde{\varvec{z}}_{d-1,\varvec{j}}\right) = p\left( z_{j_1,1}, \ldots , z_{j_{d-1},d-1}, y_{d}\right) , \end{aligned}$$
(21)

for every \(\varvec{j} \le \left( k_1, \ldots , k_{d-1}\right) \). This can be accomplished via univariate procedures in \({\mathcal {O}}(k_d)\) complexity since, for each fixed \(\varvec{j}\),

$$\begin{aligned} y_d \mapsto p\left( z_{j_1,1}, \ldots , z_{j_{d-1},d-1}, y_{d}\right) , \end{aligned}$$
(22)

is a polynomial of degree \(k_d\), and hence obeys the barycentric formula,

$$\begin{aligned} p\left( z_{j_1,1}, \ldots , z_{j_{d-1},d-1}, y_{d}\right)&= \frac{S_1\left( \widetilde{\varvec{p}}_{\varvec{j}}, y_d\right) }{S_1\left( \varvec{1}, y_d\right) },&\widetilde{\varvec{p}}_{\varvec{j}} :=\left( p_{(j_1, \ldots , j_{d-1}, \ell )} \right) _{\ell =1}^{k_d}. \end{aligned}$$
(23)

In this way, we proceed to iteratively construct \({\widetilde{p}}_q\), amounting to \(\prod _{j=1}^{q} k_j\) evaluations, each of computational complexity \({\mathcal {O}}(k_{q+1})\),

$$\begin{aligned} \left. \begin{array}{ccl} p &{} \xrightarrow {\prod _{j=1}^{d-1} k_j {\mathcal {O}}(k_d) \text { evaluations}} &{} {\widetilde{p}}_{d-1} \\ {\widetilde{p}}_q &{} \xrightarrow {\prod _{j=1}^{q-1} k_j {\mathcal {O}}(k_q) \text { evaluations}} &{} {\widetilde{p}}_{q-1} (2< q < d) \\ {\widetilde{p}}_1 &{} \xrightarrow {1 {\mathcal {O}}(k_1) \text { evaluation}} &{} p(\varvec{y}) \end{array} \right\} \end{aligned}$$
(24)

In summary, computing \(\varvec{y} \mapsto p(\varvec{y})\) can be accomplished with the procedure above, which entails repeated use of the univariate barycentric form (5).

As mentioned earlier, the cost of the procedure (24) is slightly more expensive than direct evaluation of the multidimensional barycentric form (18). In particular, the (\(\varvec{k}\)-asymptotic) cost of the direct evaluation (18) is \(\prod _{q=1}^d k_q\). On the other hand, the dimension-by-dimension approach (24) incurs additional lower-order costs, and has complexity scaling as,

$$\begin{aligned} \prod _{q=1}^d k_q + \prod _{q=1}^{d-1} k_q + \ldots = \sum _{j=1}^d \prod _{q=1}^j k_q \le d \prod _{q=1}^d k_q, \end{aligned}$$
(25)

where the inequality is a very crude bound. Thus, while the algorithm described by (24) is formally more expensive than direct evaluation (18), the actual additional cost is relatively small. In particular, for the physically relevant cases of \(d = 2, 3\), this minor increase in cost is acceptable for the achieved gain in implementation ease.

3.2 Tensorial Functions

We end this section with a brief remark on a direct simplification that can be employed in the special case that one has prior knowledge that the polynomial p is tensorial, i.e., of the form,

$$\begin{aligned} p(\varvec{\eta }) = \prod _{j=1}^d p_j(\eta _j), \end{aligned}$$
(26)

for some univariate polynomials \(\{p_j\}_{j=1}^d\) satisfying \(\deg p_j \le k_j\). In many high-order FEM simulations, basis functions in \(\varvec{\eta }\) space are often tensorial polynomials, so that this situation does occur in practice. In this tensorial case, computing the barycentric weights is simpler since we need to only compute the \(k_j\) univariate weights for dimension j associated with the grid \(z_{q,j}\) for \(q \in [k_j]\). Thus, we need to only compute \(\sum _{j =1}^d k_j\) weights, as opposed to the full set of \(\prod _{j=1}^d k_j\) multivariate weights associated with (18).

Evaluating \(\varvec{\eta } \mapsto p(\varvec{\eta })\) is likewise faster in this case: once the univariate weights are computed, then each univariate barycentric evaluation \(\eta _j \mapsto p_j(\eta _j)\) requires \({\mathcal {O}}(k_j)\) complexity. Therefore, \(\varvec{\eta } \mapsto p(\varvec{\eta })\) requires only \({\mathcal {O}}(\sum _{j=1}^d k_j)\) complexity, as opposed to the full multivariate \({\mathcal {O}}(\prod _{j=1}^d k_j)\) complexity.

3.3 Derivatives

Section 3.1 discusses how we accomplish the evaluation \(\varvec{\eta } \mapsto p(\varvec{\eta })\) algorithmically by iteratively evaluating along each dimension. This procedure is directly extensible to evaluation of (Cartesian) partial derivatives. For example, if \(p \in Q_{\varvec{k}}\), suppose for some multi-index \(\varvec{\lambda } \in \mathbb {N}_0^d\) we wish to evaluate the order-\(\varvec{\lambda }\) derivative,

$$\begin{aligned} p^{(\varvec{\lambda })}&= \frac{\partial ^{|\varvec{\lambda }|} p}{\partial \varvec{\eta }^{\varvec{\lambda }}},&\partial \varvec{\eta }^{\varvec{\lambda }}&= \partial x_1^{\lambda _1} \partial x_2^{\lambda _2} \cdots \partial x_d^{\lambda _d}. \end{aligned}$$
(27)

We are mostly concerned with \(|\varvec{\lambda }| \le 2\), i.e., at most “second" derivatives, but the procedure we describe applies to derivatives of arbitrary order. The dimension-by-dimension approach can be accomplished with essentially the same procedure as articulated in Section 3.1: Define

$$\begin{aligned} \begin{aligned} {\widetilde{p}}_d&:=p, \\ {\widetilde{p}}_q(\eta _1, \ldots , \eta _q)&:=\frac{\partial {\widetilde{p}}_{q+1}}{\partial \eta _{q+1}^{\lambda _{q+1}}}(\eta _1, \ldots , \eta _q, y_{q+1}), \end{aligned} \end{aligned}$$
(28)

so that constructing \({\widetilde{p}}_q\) from \({\widetilde{p}}_{q+1}\) at a single grid point requires evaluation of the order-\(\lambda _{q+1}\) derivative along dimension \(q+1\). Through procedures outlined in Sect. 2.2, we can accomplish this in \({\mathcal {O}}(k_{q+1})\) complexity. We must evaluate the derivative at each of the \(\prod _{j=1}^{q} k_j\) grid points associated with dimensions \(1, \ldots , q\). Therefore, the outline and complexity of this procedure is precisely as given in (24), except that \(p(\varvec{y})\) should be replaced by \(p^{(\varvec{\lambda })}(\varvec{y})\).

4 Nontensorial Multidimensional Formulations Via Duffy Transformations

The goal of this section is to describe how barycentric interpolation in the tensorial case over d-dimensional orthotopes of Sect. 3 can be utilized for efficient evaluation of polynomial approximations in certain nontensorial cases. We focus on (potentially) non-tensorial polynomial approximations in a d-dimensional variable \(\varvec{\xi }\). The variable \(\varvec{\xi }\) will be related to the tensorial variable \(\varvec{\eta }\) through “collapsed coordinates" effected by a Duffy transformation, as described in Sect. 4.1. The transformation can be applied to a variety of common FEM element types, see Table 1. A description of how barycentric evaluation procedures can be used to evaluate polynomials on these potentially nontensorial geometries is given in Sect. 4.2; specifically, the procedure is given by (32). That section also gives precise conditions to which polynomials space p must belong so that the evaluation is exact (Theorem 4.1). Section 4.3 specializes the evaluation exactness conditions to common element types used in the spectral/hp community. Section 4.4 closes the section by discussing extension of the evaluation routines to derivative evaluations through use of the chain rule.

4.1 Collapsed Coordinates

We consider polynomials in d variables \(\varvec{\xi }\). The particular type of \(\varvec{\xi }\)-polynomials we consider are defined via a mapping from \(\varvec{\eta }\) space to \(\varvec{\xi }\) space, where \(\varvec{\eta } \in [-1,1]^d\) is the tensorial variable considered in Sect. 3. The essential building block that allows us to specify the \(\varvec{\eta } \leftrightarrow \varvec{\xi }\) relationship is the Duffy transformation, which is a variable transformation in two dimensions. For \(\varvec{\eta } \in [-1,1]^d\) and some fixed \(i, j \in \{1, \ldots , d\}\) with \(i \ne j\), define \(D_{i,j}\) as the Duffy transformation that “collapses" dimension i with respect to or along dimension j and is the identity map on all the other dimensions,

$$\begin{aligned} \varvec{\zeta } = (\zeta _1, \ldots , \zeta _d)&:=D_{i,j}(\varvec{\eta }),&\zeta _\ell&= { \left\{ \begin{array}{ll} \frac{1}{2} \left( 1 + \eta _\ell \right) \left( 1 - \eta _j \right) - 1, &{} \ell = i, \\ \eta _\ell , &{} \ell \ne i \end{array}\right. }, \end{aligned}$$
(29)

for \(\ell = 1, \ldots , d\).

Table 1 Multidimensional domains resulting from collapsed coordinates, multi-indices \(\varvec{a}\) and \(\varvec{b}\) identifying the associated multivariate Duffy map, and ancestor functions \(g_{\varvec{a},\varvec{b}}\) defined in (33)

Various domains in d dimensions can be created by composing Duffy maps. For composing \(c < d\) maps, we let \(\varvec{a} \in [d]^c\) have components that represent the dimensions that are collapsed by a Duffy transformation, and let \(\varvec{b} \in [d]^c\) have components specifying along which dimensions the collapse occurs. We place the following restrictions on the entries of \(\varvec{a}\) and \(\varvec{b}\):

  • \(a_\ell < b_\ell \) for \(\ell = 1, \ldots , c\)

  • \(a_\ell < b_\ell \) for \(\ell = 1, \ldots , c\)

  • \(a_\ell < a_{\ell +1}\) for \(\ell = 1, \ldots , c-1\).

We now define the variable \(\varvec{\xi }\) as the image of \(\varvec{\eta }\) under a composition of Duffy transformations defined by \(\varvec{a}\) and \(\varvec{b}\),

$$\begin{aligned} \varvec{\xi }&:=D_{\varvec{a}, \varvec{b}} :=D_{a_1, b_1} \circ D_{a_{2}, b_{2}} \circ \cdots \circ D_{a_c,b_c} (\varvec{\eta }),&E(\varvec{a}, \varvec{b})&:=D_{\varvec{a}, \varvec{b}}\left( [-1,1]^d \right) . \end{aligned}$$
(30)

We are interested primarily in these domains for dimensions \(d = 2, 3\). Table 1 illustrates various standard geometric domains that are the result of particular choices of coordinate collapses.

Fig. 1
figure 1

Duffy transformations between triangles and quadrilateral reference elements

A visual example with \(d=2\) is also shown in Fig. 1 which gives the Duffy transformations between reference triangles and reference quadrilaterals.

4.2 Evaluation of Polynomials

The previous section illustrates how various standard domains that are used to tesselate space in finite element simulations are constructed. This section considers how we can employ barycentric evaluation in \(\varvec{\eta }\) space to accomplish evaluation of polynomials in \(\varvec{\xi }\) space. In what follows we assume that the dimension d, the grid size multi-index \(\varvec{k}\), and the Duffy transformation parameters c, \(\varvec{a}\), and \(\varvec{b}\) are all given and fixed.

Recall that on the tensorial domain \(\varvec{\eta } \in [-1,1]^d\) we have a tensorial grid \(Z_{\varvec{k}}\) comprised of \(k_q\) points in dimension q, resulting in a total of \(\prod _{q=1}^d k_q\) points in \(Z_{\varvec{k}}\). The image of this grid in \(\varvec{\xi }\) space is the result of applying the Duffy transformation:

$$\begin{aligned} \varvec{y}_j&:=D_{\varvec{a},\varvec{b}}\left( \varvec{z}_{\varvec{j}} \right) ,&\varvec{j}&\le \varvec{k}. \end{aligned}$$
(31a)

Assume that data values are furnished from a given function p,

$$\begin{aligned} P_{\varvec{k}}&:=\left( p_{\varvec{j}} \right) _{\varvec{j} \le \varvec{k}},&p_{\varvec{j}}&:=p\left( \varvec{y}_{\varvec{j}}\right) , \end{aligned}$$
(31b)

and are provided on the grid \(\varvec{y}_{\varvec{j}}\). Naturally, these values can be considered as data values in \(\varvec{\eta }\) space under the (inverse) Duffy transformation,

$$\begin{aligned} \left\{ \left( \varvec{z}_{\varvec{j}}, p_{\varvec{j}} \right) \right\} _{\varvec{j} \le \varvec{k}}, \end{aligned}$$

and therefore the barycentric routines developed in tensorial form in Sect. 3 can be applied. The main result of this section is the provision of conditions on p in \(\varvec{\xi }\) space under which applying the tensorial barycentric interpolation procedure in \(\varvec{\eta }\) space results in an exact evaluation. More precisely, we consider the following algorithmic set of steps given a point \(\varvec{\xi }\) in \(E(\varvec{a}, \varvec{b})\), and a function p:

$$\begin{aligned} \left( \varvec{\xi }, p \right) \xrightarrow {(31)} \left( \varvec{\xi }, Z_{\varvec{k}}, P_{\varvec{k}} \right) \xrightarrow {\varvec{\eta } = D_{\varvec{a},\varvec{b}}^{-1}(\varvec{\xi })} \left( \varvec{\eta }, Z_{\varvec{k}}, P_{\varvec{k}} \right) \xrightarrow {\text {Section 3}} p(\varvec{\xi }). \end{aligned}$$
(32)

Thus, our main result below gives conditions on p so that the output on the right of (32) equals the correct evaluation \(p(\varvec{\xi })\). To proceed, we need a more involved deconstruction of the element identifier multi-indices \(\varvec{a}\) and \(\varvec{b}\). The particular rules in Sect. 4.1 that define the possible values of \(\varvec{a}\) and \(\varvec{b}\) ensure that a collection of tree structures can be constructed from \(\varvec{a}\) and \(\varvec{b}\). Let each dimension \(1, 2, \ldots , d\), correspond to a node. The directed edges correspond to drawing an arrow from node \(b_j\) ending at node \(a_j\) for each \(j = 1, \ldots , c\). Since the indices \(\{a_j\}_{j=1}^c\) are all distinct, one arrow at most lands at each node, and therefore this construction forms a collection of trees. With this structure, we now define an ‘ancestor function’ on the set of dimensions, which identifies which indices are ancestors of any dimension,

$$\begin{aligned} g_{\varvec{a},\varvec{b}}&: [d] \rightarrow 2^{[d]},&g_{\varvec{a},\varvec{b}}(q)&:=\{q\} \bigcup \left\{ i \in [d] \;\big |\; i \text { is an ancestor of } q \right\} , \end{aligned}$$
(33)

where \(2^{[d]}\) denotes the power set (set of subsets) of [d]. Note that we have also included the index q in \(g_{\varvec{a},\varvec{b}}(q)\), so that \(g_{\varvec{a},\varvec{b}}(q)\) is always non-empty. The identification of \(\varvec{a}\) and \(\varvec{b}\) for typical geometries in \(d = 2, 3\) is given in Table 1. Finally, through the identification of the relation \(g_T\), we can articulate which functions in \(\varvec{\xi }\) space are exactly evaluated via the barycentric form in \(\varvec{\eta }\) space.

Theorem 4.1

With d, \(\varvec{k}\), c, \(\varvec{a}\), and \(\varvec{b}\) all given and fixed, define the following multi-index set:

$$\begin{aligned} A :=\left\{ \varvec{\alpha } \in \mathbb {N}_0^d \;\big |\; \sum _{j \in g_{\varvec{a},\varvec{b}}(q)} \alpha _j \le k_q \text { for every } q \in [d] \right\} , \end{aligned}$$
(34)

which defines a polynomial space,

$$\begin{aligned} P = P(A) :=\left\{ \varvec{\xi }^{\varvec{\alpha }} \;\big |\; \varvec{\alpha } \in A \right\} \subset Q_{\varvec{k}}. \end{aligned}$$
(35)

Then, for every \(p \in P\) (a polynomial in \(\varvec{\xi }\) space), the procedure in (32) that utilizes the barycentric evaluation algorithm of Sect. 3 exactly evaluates \(p(\xi )\).

Proof

Let \(p \in P\), and let \(\varvec{\alpha } \in \mathbb {N}_0^d\) denote the degree of p, i.e., the polynomial degree of p in dimension q is \(\alpha _q\). Since \(p(\varvec{\xi }) = p(D_{\varvec{a},\varvec{b}}(\varvec{\eta }))\), then the result is proven if we can show that \({\hat{p}} :=p \circ D_{\varvec{a},\varvec{b}} \in Q_{\varvec{k}}\), since the barycentric form in \(\varvec{\eta }\) space is exact on this space of polynomials. Note that each Duffy transformatiom \(D_{\varvec{a}, \varvec{b}}\) defined through (29) and (30) is a (multivariate) polynomial, so that \({\hat{p}} = p \circ D_{\varvec{a},\varvec{b}}\) is a polynomial, and we need to show that only its maximum degree in dimension q is less than or equal to \(k_q\) for every \(q = 1, \ldots , d\).

Fixing \(q \in [d]\), the degree of \({\hat{p}}\) in dimension q is discernible from the ancestor function \(g_{\varvec{a},\varvec{b}}\). Let \(\deg _q(f)\) denote the dimension-q degree of a polynomial f. Then the Duffy transformation definition (29) implies that for any two distinct dimensions i, j,

$$\begin{aligned} \deg _j \left( D_{i,j} \circ f \right)&= \deg _i(f) + \deg _j(f),&\deg _q \left( D_{i,j} \circ f \right)&= \deg _q(f),&q&\in [d] \backslash \{i\}. \end{aligned}$$
(36)

Since \(D_{\varvec{a},\varvec{b}}\) is a composition of univariate Duffy maps, the degree of \({\hat{p}}\) along any dimension q can be determined by tracing the history of which dimensions collapse onto q, i.e., is determined by \(g_{\varvec{a},\varvec{b}}(q)\). Thus,

$$\begin{aligned} \deg _q \left( {\hat{p}} \right) = \deg _q(p) + \sum _{i \text { is an ancestor of } q} \deg _i(p) = \sum _{i \in g_{\varvec{a},\varvec{b}}(q)} \deg _i(p) = \sum _{i \in g_{\varvec{a},\varvec{b}}(q)} \alpha _i. \end{aligned}$$
(37)

By assumption on the index set A to which \(\varvec{\alpha }\) belongs, this last term is bounded by \(k_q\). \(\square \)

Given a tensorial grid in \(Z_{\varvec{k}}\) in \(\varvec{\eta }\) space, Theorem 4.1 precisely describes what type of \(\varvec{\xi }\)-polynomial space membership p should have so that the procedure (32) exactly evaluates p.

4.3 Specializations

This section describes certain specializations of the apparatus in the previous section. Our specializations will be the two- and three-dimensional domains shown in Table 1. The goal is to show how the exactness condition of Theorem 4.1 manifests on these domains, in particular, to articulate the polynonmial space P defined in (35) on which the barycentric evaluation procedure (32) is exact. We will describe P for a given degree index \(\varvec{k}\), and will also present a special “isotropic" case when the number of points is the same in every dimension, i.e., when \(\varvec{k} = (k, k, \ldots , k)\) for some non-negative scalar integer k.

4.3.1 Quadrilaterals

We consider \(d = 2\), with \(c = 0\) Duffy maps. In this case, we have \(\varvec{\eta } = \varvec{\xi }\), and both variables take values on \([-1,1]^2\). Then, given degree \(\varvec{k} = (k_1, k_2)\), the set A in (4.1) corresponds to all multi-indices \(\varvec{j}\) satisfying \(\varvec{j} \le \varvec{k}\). Therefore, the polynomial space P in (35) is equal to \(Q_{\varvec{k}}\),

$$\begin{aligned} \begin{aligned} P = \mathrm {span} \left\{ \xi _1^{j_1} \xi _2^{j_2} \;\big |\; \varvec{j} \le \varvec{k} \right\}&= Q_{\varvec{k}}, \\ P = \mathrm {span} \left\{ \xi _1^{j_1} \xi _2^{j_2} \;\big |\; \varvec{j} \le \varvec{k} \right\}&= Q_{(k,k)}, (k = k_1 = k_2). \end{aligned} \end{aligned}$$
(38)

4.3.2 Triangles

As with quadrilateral elements we have \(d = 2\), but we now take \(c = 1\), and a Duffy transformation collapse defined by \(\varvec{a} = 1\), \(\varvec{b} = 2\). Then, the \(\varvec{\eta } \leftrightarrow \varvec{\xi }\) map is given by

$$\begin{aligned} \xi _1&= \frac{1}{2} \left( 1 + \eta _1\right) \left( 1 - \eta _2\right) ,&\xi _2 = \eta _2. \end{aligned}$$
(39)

The constraints in the definition of A are given by \(\alpha _1 \le k_1\) and \(\alpha _1 + \alpha _2 \le k_2\), so that the space P is

$$\begin{aligned} \begin{aligned} P&= \mathrm {span} \left\{ \xi _1^{j_1} \xi _2^{j_2} \;\big |\; j_1 \le k_1, \;\; j_1 + j_2 \le k_2 \right\} , \\ P&= \mathrm {span} \left\{ \xi _1^{j_1} \xi _2^{j_2} \;\big |\; j_1 + j_2 \le k \right\} = P_{k}, (k = k_1 = k_2), \end{aligned} \end{aligned}$$
(40)

where we have used \(P_k\) to denote the set of bivariate polynomials of total degree at most k.

4.3.3 Hexahedrons

We now move to three dimensions so \(d = 3\), and taking \(c = 0\) Duffy maps, again implying that \(\varvec{\xi } = \varvec{\eta }\). Therefore, the space P on which the barycentric procedure is exact is \(Q_{\varvec{k}}\):

$$\begin{aligned} \begin{aligned} P = \mathrm {span} \left\{ \xi _1^{j_1} \xi _2^{j_2} \xi _3^{j_3} \;\big |\; \varvec{j} \le \varvec{k} \right\}&= Q_{\varvec{k}}, \\ P = \mathrm {span} \left\{ \xi _1^{j_1} \xi _2^{j_2} \xi _3^{j_3} \;\big |\; j_q \le k \;\; \forall \;\; q \in [3] \right\}&= Q_{(k,k,k)}, k = k_1 = k_2 = k_3. \end{aligned} \end{aligned}$$
(41)

4.3.4 Prisms

As with hexahedral elements we have \(d = 3\), but we now take \(c = 1\), and a Duffy transformation collapse defined by \(\varvec{a} = 1\), \(\varvec{b} = 2\). Then, the \(\varvec{\eta } \leftrightarrow \varvec{\xi }\) map is given by

$$\begin{aligned} \xi _1&= \frac{1}{2} \left( 1 + \eta _1\right) \left( 1 - \eta _2\right) ,&\xi _2&= \eta _2,&\xi _3&= \eta _3. \end{aligned}$$
(42)

The constraints in the definition of A are given by \(\alpha _1 \le k_1\) and \(\alpha _1 + \alpha _2 \le k_2\), so that the space P is

$$\begin{aligned} \begin{aligned} P&= \mathrm {span} \left\{ \varvec{\xi }^{\varvec{j}} \;\big |\; j_1 \le k_1, \;\; j_1 + j_2 \le k_2, \;\; j_3 \le k_3 \right\} , \\ P&= \mathrm {span} \left\{ \varvec{\xi }^{\varvec{j}} \;\big |\; j_1 + j_2 \le k, \;\; j_3 \le k \right\} , (k = k_1 = k_2 = k_3). \end{aligned} \end{aligned}$$
(43)

4.3.5 Tetrahedrons

Also with \(d = 3\) and \(c=2\), a Duffy transformation collapses defined by \(\varvec{a} = (1, 2)\) and \(\varvec{b} = (2, 3)\), the \(\varvec{\eta } \leftrightarrow \varvec{\xi }\) map is given by

$$\begin{aligned} \xi _1&= \frac{1}{2} \left( 1 + \eta _1\right) \left( 1 - \frac{1}{2} \left( 1 + \eta _2\right) \left( 1 - \eta _3\right) \right) ,&\xi _2&= \frac{1}{2} \left( 1 + \eta _2\right) \left( 1 - \eta _3 \right) ,&\xi _3&= \eta _3. \end{aligned}$$
(44)

The constraints in the definition of A are given by \(\alpha _1 \le k_1\) and \(\alpha _1 + \alpha _2 \le k_2\), and \(\alpha _1 + \alpha _2 + \alpha _3 \le k_3\) so that the space P is

$$\begin{aligned} \begin{aligned} P&= \mathrm {span} \left\{ \varvec{\xi }^{\varvec{j}} \;\big |\; j_1 \le k_1, \;\; j_1 + j_2 \le k_2, \;\; j_1 + j_2 + j_3 \le k_3 \right\} , \\ P&= \mathrm {span} \left\{ \varvec{\xi }^{\varvec{j}} \;\big |\; j_1 + j_2 + j_3 \le k, \right\} = P_k, (k = k_1 = k_2 = k_3). \end{aligned} \end{aligned}$$
(45)

where we have used \(P_k\) to denote the set of trivariate polynomials of total degree at most k.

4.3.6 Pyramids

Finally, we again take \(d = 3\) and \(c=2\), a Duffy transformation collapses defined by \(\varvec{a} = (1, 3)\) and \(\varvec{b} = (2, 3)\). Then, \(\varvec{\eta } \leftrightarrow \varvec{\xi }\) map is given by

$$\begin{aligned} \xi _1&= \frac{1}{2} \left( 1 + \eta _1\right) \left( 1 - \eta _3\right) ,&\xi _2&= \frac{1}{2} \left( 1 + \eta _2\right) \left( 1 - \eta _3 \right) ,&\xi _3&= \eta _3. \end{aligned}$$
(46)

The constraints in the definition of A are given by \(\alpha _1 \le k_1\) and \(\alpha _1 \le k_2\), and \(\alpha _1 + \alpha _2 + \alpha _3 \le k_3\) so that the space P is

$$\begin{aligned} \begin{aligned} P&= \mathrm {span} \left\{ \varvec{\xi }^{\varvec{j}} \;\big |\; j_1 \le k_1, \;\; j_2 \le k_2, \;\; j_1 + j_2 + j_3 \le k_3 \right\} , \\ P&= \mathrm {span} \left\{ \varvec{\xi }^{\varvec{j}} \;\big |\; j_1 + j_2 + j_3 \le k, \right\} = P_k, (k = k_1 = k_2 = k_3). \end{aligned} \end{aligned}$$
(47)

4.4 Derivatives and Gradients

The results of Sect. 4.2 lead naturally to derivative evaluations. In particular, by defining the function \({\hat{p}} :=p \circ D_{\varvec{a},\varvec{b}}\) in \(\varvec{\eta }\) space, and writing \(p(\varvec{\xi }) = {\hat{p}}(\varvec{\eta }(\varvec{\xi }))\), we can translate derivatives of p to those of \({\hat{p}}\), which can be efficiently evaluated using the results from previous sections.

Using the chain rule, the gradient of p can be written in terms of the gradient of \({\hat{p}}\),

$$\begin{aligned} \nabla _{\varvec{\xi }} p\left( \varvec{\xi }\right) = \frac{\mathrm {D} \varvec{\eta }}{\mathrm {D} \varvec{\xi }} \nabla _{\varvec{\eta }} {\hat{p}}\left( \varvec{\eta }\right) , \end{aligned}$$
(48)

where \(\nabla _{\varvec{\eta }}\) is the standard d-variate gradient operator with respect to the Euclidean variables \(\varvec{\eta }\), and we have defined the \(d \times d\) Jacobian matrix of the \(\varvec{\xi } \mapsto \varvec{\eta }\) map,

$$\begin{aligned} \left( \frac{\mathrm {D} \varvec{\eta }}{\mathrm {D} \varvec{\xi }} \right) _{i,j} = \left( \frac{\mathrm {D} D^{-1}_{\varvec{a},\varvec{b}}(\varvec{\xi })}{\mathrm {D} \varvec{\xi }} \right) _{i,j} = \frac{\partial \eta _i}{\partial \xi _j}. \end{aligned}$$
(49)

Note that the individual Duffy maps \(D_{a,b}\) defined in (29) that collapse dimension a onto dimension b are invertible whenever \(\eta _b \ne 1\), so that the Jacobian above is well defined away from these points. The formula above shows that since the gradient of \({\hat{p}}\) can be evaluated efficiently through the procedures in Sect. 4.4, so, too, can the gradient of p. In particular, this procedure exactly evaluates gradients (away from singularities of the Duffy transformation) if \(p \in P(A)\) where P(A) is given in (35).

Similarly, components of the Hessian of p can be evaluated as,

$$\begin{aligned} \frac{\partial ^{2} p}{\partial \xi _i \partial \xi _j} = \left( \frac{\partial ^{2} \varvec{\eta }}{\partial \xi _i \partial \xi _j}\right) ^T \nabla _{\varvec{\eta }} {\hat{p}} + \left( \frac{\partial \varvec{\eta }}{\partial \xi _i} \right) ^T \varvec{H}_{\varvec{\eta }}({\hat{p}}) \left( \frac{\partial \varvec{\eta }}{\partial \xi _j} \right) , \end{aligned}$$
(50)

where \(\frac{\partial ^{2} \varvec{\eta }}{\partial \xi _i \xi _j} \in \mathbb {R}^d\) and \(\frac{\partial \varvec{\eta }}{\partial \xi _i} \in \mathbb {R}^d\) are componentwise derivatives, and \(\varvec{H}_{\varvec{\eta }}({\hat{p}})\) is the \(d \times d\) Hessian of \({\hat{p}}\). Again, since the Hessian of \({\widehat{p}}\) can be efficiently evaluated through the procedures in Sect. 4.4, the Hessian of p also inherits this asypmtotic efficiency. This procedure again exactly evaluates Hessians (away from singularities of the Duffy transformation) if \(p \in P(A)\). If \(p \in P(A)\), higher-order derivatives of p may likewise be computed exactly from those of \({\hat{p}}\) using Faà di Bruno’s formula with \({\mathcal {O}}\left( \prod _{j=1}^d k_j\right) \) complexity stemming from the multivarite barycentric procedures described earlier.

5 Algorithmic and Implementation Details

In this section, we present the implementation details of the barycentric Lagrange interpolation in terms of the data structures and algorithms involved. The implementation follows the high-level algorithms described in Sects. 3 and 4, but some details differ in service of computational routine optimization. The implementation of these concepts can be accessed in the open-source spectral/hp element library Nektar++ [2, 15].

5.1 Algorithm

The foundation of the implementation is in the kernel that performs the barycentric interpolation itself as given in Eq. (1) – that is, it takes the coordinate of a single arbitrary point and the stored physical polynomial values at each quadrature point in the expansion and returns the interpolated value at the arbitrary point. This kernel has been templated to perform the interpolation only in a specific direction based on the integer template parameter DIR, and also to return the derivative value and second-derivative value by the reference parameter based on the boolean template parameters DERIV, and DERIV2. Templating here is defined in the sense of  templates; i.e. that these expressions are evaluated at compile time to reduce branching overheads and enable compiler inlining. For example, when DERIV and DERIV2 are not required, setting these template variables to false allows for performance gains by removing the if branch tests from the generated object code. The reasoning behind this unifying of the physical value evaluation and derivative interpolations is that we can make use of terms computed in the physical evaluation in the derivative interpolations saving repeat calculations (cf. (10) and (11)). An example kernel for physical, first- and second-derivative values is shown in Algorithm 1.

The next important method is the tensor-product function, which constructs the tensor line/square by calling the barycentric interpolation kernel on quadrature points, and is therefore dimension dependent and operates on the reference element in the appropriate form. The 1D version performs the barycentric interpolation directly on the provided point and returns the physical, first- and second- derivative value in the \(\xi _{1}\) direction. In 2D and 3D, we chose to implement only the first-derivative to reduce overall complexity of the tensor-product method. The 2D version constructs an interpolation in the \(\xi _{1}\) direction to give an intermediate step of physical values and derivative values at the expansion quadrature points in the same \(\xi _{1}\) direction. The quadrature derivative values in the \(\xi _{1}\) direction can then be evaluated in the \(\xi _{2}\) direction to produce the single derivative value in the \(\xi _{1}\) direction at the point provided. Likewise, the quadrature physical values in the \(\xi _{1}\) direction can then be evaluated in the \(\xi _{2}\) direction with the derivative output enabled to return both the single derivative value in \(\xi _{2}\) direction and the physical value at the provided point. The tensor product in 3D is similar, except we now also consider the \(\xi _{3}\) direction and so our intermediary steps consist of constructing the tensor square. Structuring it in this manner allows for a minimum number of calls to the barycentric interpolation kernel. An example tensor product function in 2D is shown in Algorithm 2.

As this tensor-product function operates on the reference element that in 2D is a quadrilateral, and in 3D a hexahedron, additionally, it can be extended to non-reference shape types by collapsing coordinates and performing the correct quadrature point mapping as described in Sect.  4.1. This is achieved by overriding the existing reference element interpolation function, which evaluates the expansion at a single (arbitrary) point of the domain to also give it the capabilities to evaluate the derivative in each direction as needed. This function is a wrapper around a virtual function that is defined for each shape type and therefore allows for the shape dependent coordinate collapsing. Example structures of these functions for a triangular shape type are shown in Algorithms 3 and 4.

figure b
figure c
figure d
figure e

5.2 Complexity Analysis

Given a function \(p(\eta _{\texttt {DIR}})\) evaluated at \(k+1\) quadrature points \(Q = \{z_0, z_1, \cdots , z_{k}\}_{\texttt {DIR}}\), we perform the following steps to find the interpolated values of \(p(\eta ),\) and \(\frac{\partial p }{\partial \eta _{\texttt {DIR}}}\) at a given point \(\eta \notin Q\):

  1. (a)

    Calculate and store the weights \(\{w_j\}, \forall j = 0, 1, \cdots k\) as per (2), which requires storage of size \(k+1\). The number of flops for this operation is \((k+1)^2 +1\), and the complexity is \(O(k^2)\), which is a one-time setup cost.

  2. (b)

    Calculate \(p(\eta )\) using step 32 of Algorithm 1, for which we use the pre-computed weights w from previous step and calculate the terms A and F. The storage requirement for each of these terms is of size \(k+1\), and the number of flops required to calculate them is \(3(k+1)\) and \(2(k+1)\), respectively. (If we reuse the term \(t_1\) from step 18, calculating F needs only \(k+1\) flops). Thus, the total complexity of finding \(p(\eta )\) using the barycentric method is O(k), which is consistent with the evaluation in [1].

  3. (c)

    Calculate \(\frac{\partial }{\partial \eta _{\texttt {DIR}}}p(\eta )\) as shown in step 36 of Algorithm 1 which uses the precomputed weights w and the terms A and F from the previous step. Additional terms B and C are evaluated as per steps 23 and 24 of Algorithm 1. The storage requirement for these terms is of size \(k+1\) each. The calculation of term B requires \(4(k+1)\) flops (or \(3(k+1)\) using precomputed \(t_1\)). Similarly, calculating term C requires \(2(k+1)\) flops (or \(k+1\) if we consider precomputed \(t_2\)). Therefore, the total complexity of evaluating \(\frac{\partial }{\partial \eta _{\texttt {DIR}}}p(\eta )\) is O(k).

  4. (d)

    Applying a similar analysis for \(\frac{d^2}{d\eta ^2_{\texttt {DIR}}}p(\eta )\), we need to evaluate additional terms D and E as shown in steps 27 and 28 of Algorithm 1. We need additional storage of size \(k+1\) for each of these terms. The computational complexity for both D and E is O(k).

Note that the analysis presented above is independent of dimensions (DIR). For higher dimensions, we follow the same procedure in each individual direction. For example, in 2D we require evaluation of p for \(\texttt {DIR}=1\) and \(\texttt {DIR} = 2\). Therefore, when \(\texttt {DIR} = 2\), the algorithm takes twice the amount of calculations as 1D. Thus, the complexity of the evaluation is still O(k). Similarly, for the first-derivative, we need the individual evaluations \(\partial p/\partial \eta _0\) and \(\partial p/\partial \eta _1\). Therefore, the computational complexity of the derivative evaluation is O(k). By extension, the computational complexity for the second-derivative is also O(k).

6 Evaluation and Comparison

6.1 Baseline Evaluations

To investigate how this implementation of the barycentric interpolation affects the efficiency of evaluation for physical and derivative values, a number of tests were run across various cases. The barycentric interpolation method has been implemented within the Nektar++ spectral/hp element framework [2, 15] in a discontinuous Galerkin (DG) setting and is compared with two variants of the already existing standard Lagrange interpolation method, the first where the interpolation matrix is recalculated every iteration, and the second where the interpolation matrix is stored across iterations, mimicking a scenario involving history points. All test cases below were carried out on a single core of a dual-socket Intel Xeon Gold 5120 system, equipped with 256GB of RAM, with the solver pinned to a specific core in order to reduce the influence of kernel core and socket reassignment mid-process.

6.1.1 Construction of the Baseline Tests

These baseline tests are constructed for the desired elemental shape using the hierarchical modified basis of Karniadakis & Sherwin [10] of order P with tensor products of \(P+2\) points in each direction. We make use of Gauss-Lobatto-Legendre points in noncollapsed directions and Gauss-Radau points in collapsed directions to avoid evaluation at singularities. The physical values at these points are provided by the polynomial \(p(\varvec{\xi }) = \xi _{1}^{2} + \xi _{2}^{2} - \xi _{3}^{2}\), which also allows for an analytical solution at any \(\varvec{\xi }\) for the physical and derivative values in each direction. The physical and derivative values are sampled on the constructed shape on a collocation grid that is again constructed as GLL/GLR points, like the quadrature rule. However, we use a fixed collocation grid size while varying order P, so that we ensure that the collocation grid is distinct from the quadrature rule in most cases. To ensure the same number of points is being sampled for all shape dimensions, we choose to use 64 total points because of the symmetry so that in 1D, it is \(64^1\), 2D it is \(8^2\), and 3D it is \(4^3\). This creates some special considerations when the collocation grid matches exactly with the quadrature points used within the shape, which we discuss in the relevant sections below. We average the timings, in 1D from \(10^6\) evaluations, and in 2D/3D from \(10^5\) evaluations, to ensure results are not affected by system noise or other external factors. The tests are performed for a range of basis orders from 2 to 20. In 1D, we calculate the physical, first and second derivative values, whereas in 2D and 3D, we calculate only the physical and first derivative values.

6.1.2 1D barycentric Interpolation and Derivatives

Figure 2 shows that recalculating the interpolation matrix every cycle is the notably slower of the three methods, whereas the barycentric interpolation and stored interpolation matrix method are closer in performance with the barycentric interpolation being on average across the orders \(33\%\) slower for the solution evaluation only, \(20\%\) slower when including first derivatives, and \(18\%\) slower when including second derivatives. However, the barycentric interpolation wins in terms of the storage complexity. This is because the former requires \({\mathcal {O}}(k)\) storage to store the weights \(w_j\), where k and \(z_j\) are given and fixed. On the other hand, the stored interpolation matrix has the best case space complexity of \({\mathcal {O}}(k^2)\). An interesting feature present is the minor reductions in interpolation time for the stored matrix method at order 3, 8, 13, and 18. These basis orders correspond to the number of quadrature points being a multiple of five, which we theorize is the line cache size of the CPU being used. A match of this cache size with the quadrature point array sizes in our implementation will result in memory optimizations for the interpolation matrix multiplications. This phenomena will also be present for the recalculated matrix method, however, the result of optimization in this case is not visible on the graph due to the larger time scale. It can be seen that the baseline computational cost increases moving from interpolating the physical values only (Fig. 2a) to also including the first derivatives (Fig. 2b), and then the second derivatives (Fig. 2c) for all methods.

Fig. 2
figure 2

Baseline interpolation timings for a segment. a Only physical values, b Physical and first derivative values, c Physical and first-, and second-derivative values

6.1.3 Extension to Traditional Tensor-Product Expansions

For the traditional tensor-product expansions, we now consider a quadrilateral element in 2D and a hexahedron in 3D. Figure 3 shows the results for the quadrilateral element. Generally results are similar to that for the segment. An obvious unique feature is the spike in the recalculated matrix timing result at order 6. The spike corresponds to 8 quadrature points in each direction, which is the same as the number we are sampling on, and therefore the points are collocated. Consequently, the routine in which the interpolation matrix is constructed has to handle this collocation, which results in the increased cost. We can also see the the barycentric interpolation method handling this collocation, resulting in a speed-up compared to neighboring orders, evident in Fig. 3a. On average the barycentric interpolation method is \(30\%\) slower across the orders for the solution evaluation only when compared to the stored interpolation matrix method. Figure 3b including the first derivative evaluations shows that the barycentric interpolation method is on average \(15\%\) faster across the orders than the stored matrix variant of the Lagrangian method indicating the computational cost savings from unifying the derivative call.

Fig. 3
figure 3

Baseline interpolation timings for a quadrilateral. a Only physical values, b Physical and first-derivative values

The same collocation trend is present in the hexahedral element, shown in Fig. 4 with the spike now present at order 2, which corresponds with the 4 quadrature points in each direction. The trends are similar again to the 1D and 2D results. On average across the orders for the the solution evaluation only the barycentric interpolation method is \(48\%\) slower than the stored matrix interpolation method. The most notable difference compared to the 2D results is in the first derivative timings (Fig. 4b), which when disregarding order 2 demonstrates the barycentric interpolation method is on average \(10\%\) slower at orders \(\le 11\), while at orders \(>11\) it is on average \(9\%\) faster.

Fig. 4
figure 4

Baseline interpolation timings for a hexahedron. a Only physical values, b Physical and first derivative values

6.1.4 Extension to General Expansions

We now compare the most complicated of the available shape types in two and three dimensions, the triangle and the tetrahedron, which require the use of Duffy transformations. The results shown in Figs. 5 and 6 align closely with their tensor-product expansion counterparts, the quadrilateral and hexahedron, respectively. A unique feature can now be seen in the recalculated matrix interpolation method that appears to show a odd/even cyclical trend. We believe this is again due to collocated points, this time as a consequence of the collapsing of the element and the routine used to calculate the interpolation matrix making use of a floor function.

Fig. 5
figure 5

Baseline interpolation timings for a triangle: a Only physical values and b Physical and first-derivative values

Fig. 6
figure 6

Baseline interpolation timings for a tetrahedron: a Only physical values and b Physical and first-derivative values

6.1.5 Speed-Up Factor

To further compare the methods, Figure 7 shows the speed-up factor when going from the recalculated matrix variant of the Lagrangian interpolation method to the barycentric interpolation method for segments, quadrilaterals, and hexahedrons. This shows that in 1D as the order increases the speed-up increases, for 2D it stays approximately the same, and for 3D it decreases. We can see that including the first-derivatives (Fig. 7b) in 1D causes the speedup factor to reduce when compared with the evaluation only version (Fig. 7). In 2D, the speed-up factor remains approximately consistent between the two versions, and in 3D it increases. A minimum speedup factor of approximately 7 is observed across all tests occurring in hexahedrons greater than order 17 when calculating the solution evaluation only.

Fig. 7
figure 7

Speed-up factors calculated for the segment, quadrilateral, and hexahedron: a Only physical values and b Physical and first-derivative values

6.2 Real-World Usage Example

To investigate the performance of the barycentric interpolation method in a less artificial setting, we now consider a real world problem containing a nonconformal interface, again posed in a DG setting within the Nektar++ spectral/hp element framework. In general we can imagine two scenarios: the first in which the nonconformal interface is fixed in time, and the second where the interface changes at each timestep to account for e.g. a grid rotation or translation. We investigate both settings in this section.

6.2.1 Handling the Nonconformal Interface

To handle the transfer of information across a nonconformal interface we adopt a point-to-point interpolation approach as outlined in [12], which involves minimizing an objective function to find an arbitrary point on a curved element edge utilizing the inverse of a parametric mapping to the reference element. In our implementation, this minimization problem is solved via a gradient-descent method utilizing a quasi-Newton search direction and backtracking line search that makes use of repeated calls to determine the physical, first- and second-derivative values within the loop. Once calculated and for a stationary interface the location of this arbitrary point in the reference element can be cached; however, to mimic a moving interface, where the minimization routine must be run every timestep, we disable this caching in order to also evaluate the performance impact of the new barycentric interpolation method. This is the equivalent of the comparison to the first Lagrange interpolation method discussed above, where the interpolation matrix is recalculated every iteration.

6.2.2 Test Case

We select a standard linear transport equation \(u_t + \nabla \cdot \varvec{F}(u) = 0\) within a domain \(\Omega =[-1,1]^2\), so that \({\varvec{F}}(u) = {\varvec{v}}u\) for a constant velocity \({\varvec{v}} = (1,0)\), and an initial condition that is nonpolynomial, so that \({\varvec{u}}({\varvec{x}},0) = \sin (2\pi x)\cos (2\pi y)\). The domain consists of a single nonconformal interface with unstructured quadrilateral subdomains on either side, as visualized in Fig. 8 together with the initial condition for u. This means that the interpolation is being performed on the trace edges of the elements located at the nonconformal interface, which in this 2D example are segments. A polynomial order of \(P=8\) is considered, and we select \(Q=P+2 = 10\) quadrature points in each coordinate direction. We select a timestep size of \(\Delta t = 10^{-3}\) and time for 10 cycles (i.e., \(t=10\)), which is the equivalent of \(10^4\) timesteps.

Fig. 8
figure 8

The nonconformal mesh for the real world usage example with the initial projection of the u field overlaid

Table 2 Timings for the real world case using the Barycentric and Lagrangian method

We initially obtain a baseline time for both interpolation methods with the cache enabled. The cache is then disabled and both methods run again, allowing us to compare the cached (static) version with the non-cached (moving) version as a demonstration of the high computational cost incurred by calling this minimization routine every timestep. The results for the cached and noncached versions are shown in Table 2. This demonstrates that with the cache enabled, the timings for both methods are practically identical because the minimization occurs only in the first timestep which incurs a negligible cost over this timescale. However, the non-cached results (where the minimization procedure is run every timestep) shows a slowdown of around \(13\times \) for the Lagrangian method, but only \(3\times \) for the barycentric method when compared with the cached results. We can then calculate the performance impact of these methods only on the minimization routine, which shows the routine using the barycentric method as around \(6\times \) faster than the equivalent routine using the Lagrangian method. This is a significant speed-up, as the minimization routine accounts for a large proportion of the total computational time: for the Lagrangian method, this routine occupies 92% of total time whereas using the barycentric approach reduces this to 66%. The speed-up is realized in the total time taken for all \(10^4\) timesteps being reduced from 216s to 48s.

7 Conclusions

In the context of spectral/hp and high-order finite elements, solution expansion evaluation at arbitrary points in the domain has been a core capability needed for postprocessing operations such as visualization (streamlines/streaklines and isosurfaces) as well for interfacing methods such as mortaring. The process of evaluation of a high-order expansion at an arbitrary point in the domain consists of two parts: determining in which particular element the point lies, and evaluating the expansion within that element. This work focuses on efficient solution expansion evaluation at arbitrary points within an element. We expand barycentric interpolation techniques developed on an interval to 2D (triangles and quadrilaterals) and 3D (tetrahedra, prisms, pyramids, and hexahedra) spectral/hp element methods. We provided efficient algorithms for their implementations, and demonstrate their effectiveness using the spectral/hp element library Nektar++. The barycentric method shows a minimum speedup factor of 7 when compared with the non-cached interpolation matrix version of the Lagrangian method across all tests demonstrating a good performance uplift, culminating in an approximately \(6\times \) computational time speedup for the real-world example of advection across a nonconformal interface. In the artificial tests the barycentric method exhibits slightly worse performance than the stored interpolation matrix version of the Lagrangian method when evaluating purely physical values, with slowdowns of between \(10\%\) and \(50\%\) across all orders dependant on element type. However if first derivatives are also required the barycentric method can outperform the stored interpolation matrix method by up to \(35\%\).