1 Introduction

Interpolation plays an essential role in many applications in the field of numerical analysis. Numerous papers address this critical issue in the literature, and one of the most cited methods is undoubtedly the Lagrangian polynomial interpolation.

However, Lagrange interpolation is not without drawbacks, the two most important of which are the well-known Gibbs and Runge phenomena. The latter is related to the distribution of the interpolation points and can be avoided by choosing them appropriately, see Sect. 2 and [1, Chapters 11–13]. We will focus on strategies to solve the former, which is usually associated with functions of low regularity and occurs when the function to be interpolated has a jump or a steep derivative.

One of the most straightforward strategies consists in using piecewise linear interpolation. This interpolation is free of spurious oscillations, but the convergence is very slow. The idea can be extended to piecewise interpolation as in the finite element framework, i.e., the function’s domain is partitioned, and the interpolation is performed on the partition elements using polynomials of a given degree. In the finite element framework, the degree of the polynomials is usually less than 10, so much so that when the finite element space consists of polynomials of a degree greater than 2 in each element, then it is often referred to as a higher order finite element method [2, 3]. This is the application we have in mind, and it constitutes the setting of the present work. Nevertheless, the interpolation procedure presented here does not depend on its application to the finite element method and could be used to advantage in any context where the conditions do not allow a higher number of interpolation points to be taken.

Once installed in this scenario, spurious oscillations can be avoided using, for instance, monotone interpolation. Good examples are the so-called monotone piecewise cubic interpolation, which is a modification of the cubic spline interpolation [4], and the modified Akima interpolation, which also performs piecewise \(C^{1}\)-cubic interpolation [5, 6]. Monotone interpolation in the finite element framework has also been performed in semi-Lagrangian schemes [7]. This kind of interpolation is very efficient in avoiding the appearance of overshoots, but it is known to frequently result in diffusive numerical schemes for PDEs, see Figs. 9 and 12.

Much attention has lately been paid to the barycentric Lagrange interpolation formula—a symmetric rational representation of the Lagrange polynomial [8, 9]—and several related rational variants [10, 11]. The barycentric representation gives rise to fast and stable procedures allowing thousands of interpolation nodes. The idea has proven fruitful spreading into a broad range of applications, see, for instance, [12,13,14]. However, such an amount of nodes is impractical in our setting. To show the difference, the degree of the interpolating polynomials we will use later in the numerical examples will be at most 16 in the univariate case. In the finite element method, a relatively small mesh of a standard 2D problem has at least \(10^4\) elements. Thus, using an interpolating polynomial of degree \(m=10\) in each element results in a total number of interpolation points of order \(10^6\). Now, the finite element method involves dealing with matrices of order \(10^6\) that are sparse if m is small; see (20). Otherwise, the matrices are numerically intractable. Therefore, in practice, \(m=10\) represents an upper bound on the degree of interpolation we can perform when solving two-dimensional problems, which rules out most currently available methods. Three-dimensional problems are even more numerically demanding. On another note, although our method can be extended to rational interpolation without any problem, we prefer to explore first the polynomial interpolation since it is the default approximation procedure employed in the finite element method.

Recent studies are also devoted to solving the Gibbs phenomenon using the so-called mapped bases, first introduced for polynomial interpolation in [15]. Here, the authors introduce a map to perform the interpolation in the mapped domain, which allows reducing the Gibbs phenomenon significantly. The idea has been applied to barycentric rational interpolation [16] and bivariate interpolation [17]. Although the results are encouraging, the authors assume that the location of the possible discontinuities is known. In one dimension, existent algorithms can find such discontinuities. There are also several algorithms for locating these discontinuities in higher dimensions, such as TVM-based methods; see [18] and references therein. However, these methods are expensive and may require mesh refinements and function evaluation at the new points, which is impossible in interpolating a finite element function because the function belongs to a finite element space, and the only possible points for interpolation are those associated with the original mesh; see Sect. 5.

We propose a different approach to overcome these difficulties. The main idea is to define a simple rational transformation g so that the values of the transform \(\hat{f}=g\circ f\) used to build the interpolation are numerically smooth, where f is the function to be interpolated. Once g is defined (with \(g^{-1}\) analytically computable), the function \(\hat{f}\) is interpolated by a Lagrange polynomial \(\hat{p}\), and then \(g^{-1}\circ \hat{p}\), which is no longer a polynomial, serves as an interpolant of f. Of course, the critical step is how to define the transformation g. We choose g by a least-squares method that uses the values of f at the interpolation points as input and aims to make the derivative of \(\hat{f}\) as uniform as possible. Notice that, in this approach, g depends on the function f. It turns out that the interpolant \(g^{-1}\circ \hat{p}\) usually detects the jump or boundary layer of f, and this procedure can be readily extended to multidimensional interpolation.

Using a transformation g as a tool is certainly not a new idea and has been employed in various contexts; see, for example, [19,20,21] and [1, Chapter 22]. Nevertheless, the main contribution of our work is to consider a transformation g that acts in the form \(g\circ f\) instead of \(f\circ g\) (as is the case of the mapped bases). Although simple, the idea is new in the literature as far as we know, and it has great advantages, such as its direct extension to higher dimensions, since g maps the one-dimensional image of the function f instead of its multidimensional domain. In this sense, the particular form of the transformation g is rather secondary, and obviously, many different choices from the one considered here are possible.

The structure of the paper is as follows. In the next section, we review the basic ideas about Lagrangian interpolation, focusing on how the choice of interpolation points affects the error formulas. Section 3 is devoted to explaining our algorithm, of which we give two versions: a naive one that may work in many cases and a more elaborate one based on the selection of g by minimizing a functional. We also provide ample numerical evidence supporting the reliability of the method. This section ends with considerations about convergence and how the boundary layers of g help decrease the interpolation error. In Sect. 4, we show that extending our method to multidimensional interpolation is possible with almost no further modifications. Several numerical examples covering various types of functions of two variables are provided. In the last section, we briefly introduce the finite element method for the univariate case and show how it is improved by the interpolation procedure presented here.

Finally, we would like to make a caveat regarding the language used. The Gibbs phenomenon is understood as those oscillations produced when trying to approximate a function with jumps. First discovered in the context of the Fourier series, the phenomenon appears in many other situations where one tries to pinpoint local information using some global data [22]. When few interpolation nodes are used, a smooth function with a steep gradient causes a similar effect to that of a jump function since both functions are indistinguishable from the numerical point of view. We will use the expression “Gibbs phenomenon” in this extended sense.

2 Lagrange interpolation

Let us briefly overview the Lagrangian interpolation, c.f. [23, 24]. Our goal in this section is to give some error estimates that will be helpful in Sect. 3.4 and present the interpolation nodes used in the paper for the numerical examples.

Let f be a bounded real function defined on the interval \([-1,1]\). We assume that \(f(\left[ -1,1\right] )\subset \left[ -1,1\right] \). Note that if the domain or the image of f are not the interval \(\left[ -1,1\right] \), then the function f can be easily transformed to fulfil both assumptions. The expression \(\Vert f\Vert \) stands for the sup norm of the function f on the interval \([-1,1].\)

Given a set of interpolation nodes \(X=\left\{ x_{i},\, 0\le i\le n\right\} \subset \left[ -1,1\right] \), with \(x_{i}\ne x_{j}\) if \(i\ne j\), it is well known that there exists a unique polynomial \(p_{n}\) of degree less than or equal to n that satisfies \(p_{n}(x_{i})=f(x_{i})\) for all \(0\le i\le n\). Throughout the paper, we will consider that \(x_i<x_j\) if \(i<j.\) The polynomial \(p_{n}\) can be written as

$$ p_{n}(x)=\sum _{i=0}^{n}f_{i}\ell _{i}(x), $$

where \(f_{i}=f(x_{i})\) and

$$\begin{aligned} \ell _{i}(x)=\prod _{j=0,j\ne i}^{n}\frac{x-x_{j}}{x_{i}-x_{j}},\quad 0\le i\le n, \end{aligned}$$
(1)

are the Lagrange fundamental polynomials. Set

$$\begin{aligned} \Lambda _n(X)= \left\| \sum _{i=0}^n\left| \ell _i\right| \right\| . \end{aligned}$$

The quantity \(\Lambda _n(X) \) is called the Lebesgue constant associated with the interpolation points X. Note that \(\Lambda _n(X) \) depends on X, but is independent of f. The Lebesgue constant measures how good the polynomial interpolation associated with the nodes X is compared to the best uniform polynomial approximation. In fact, it is very easy to prove the inequality

$$\begin{aligned} \Vert f-p_n\Vert \le \Big (1+\Lambda _n(X)\Big ) E_n[f], \end{aligned}$$
(2)

where \(E_n[f]\) is the error of best uniform polynomial approximation, i.e.,

$$\begin{aligned} E_n[f]=\min \left\{ \Vert f-q_n\Vert \,:\, q_n\in \mathcal {P}_n\right\} , \end{aligned}$$

where \(\mathcal {P}_n\) is the space of polynomials of degree less than or equal to n.

The number \(E_n[f]\) is closely connected to the regularity of the function f: the more regular f is, the smaller \(E_n[f]\) becomes, see [24, Section 4.6].

The Lebesgue constant \(\Lambda _n(X) \) also quantifies the conditioning of the polynomial interpolation problem in the Lagrange basis \(\{\ell _i,\, i=0,1,\ldots ,n\}\), since the equality

$$\begin{aligned} \Lambda _n(X)=\sup \left\{ \Vert p_n\Vert :\, \Vert f\Vert \le 1,\, f\in C([-1,1])\right\} \end{aligned}$$

holds, where \(p_n\) is the Lagrange polynomial interpolating f at the nodes X.

It is therefore of the utmost importance to choose interpolation points X that make the Lebesgue constant as small as possible. Unfortunately, \(\Lambda _n(X) \) always grows as n increases, since, according to [25],

$$\begin{aligned} \Lambda _n( X)\ge \frac{2}{\pi }\log (n+1)-c_1,\quad c_1>0, \end{aligned}$$

for any set X,  a fact that is responsible for the divergence phenomena affecting polynomial interpolation.

If we denote by \(X^*\) a set of interpolation points such that

$$\begin{aligned} \Lambda _n(X^*)=\min _X \Lambda _n(X), \end{aligned}$$

then, c.f. [26], we have (\(\gamma \) is Euler’s constant)

$$\begin{aligned} \Lambda _n( X^*)=\frac{2}{\pi }\left( \log (n+1)+\gamma +\log \frac{4}{\pi } \right) +o(1), \end{aligned}$$
(3)

although the set \(X^*\) is unknown except for a few particular values of n.

In order to guess a good candidate X with a small Lebesgue constant, we can consider another well-known estimate of the interpolation error. Given a set \(X=\left\{ x_{i},\, 0\le i\le n\right\} \), we write

$$\begin{aligned} w_{n+1}(x)=\prod _{i=0}^n (x-x_i), \end{aligned}$$

and then it holds that

$$\begin{aligned} \Vert f-p_n\Vert \le \frac{\Vert w_{n+1}\Vert }{(n+1)!}\, \Vert f^{(n+1)}\Vert , \end{aligned}$$
(4)

provided \(f\in C^{n+1}(\left[ -1,1\right] )\), see [24, Section 3.2]. It is well known that the quantity \(\Vert w_{n+1}\Vert \) attains its minimum when \(w_{n+1}=2^{-n}T_{n+1}\), where \(T_n\) is the n-th Chebyshev polynomial of the first kind defined by \(T_{n}(\cos \theta )=\cos (n\theta )\), see again [24, Section 3.2]. Thus, if we take the set of points

$$ T=\left\{ -\cos \left( \frac{(2i+1)\pi }{2n+2}\right) ,\,i=0,1,\ldots , n\right\} $$

as the interpolation points, then \(\Vert w_{n+1}\Vert =2^{-n}\), and we obtain from (4) the optimal error estimate

$$\begin{aligned} \Vert f-p_n\Vert \le \frac{ \Vert f^{(n+1)}\Vert }{2^n(n+1)!}. \end{aligned}$$

It turns out that the set T is also a good choice concerning the Lebesgue constant, since it was proven in [27] that

$$\begin{aligned} \Lambda _n( T)=\frac{2}{\pi }\left( \log (n+1)+\gamma +\log \frac{8}{\pi } \right) +o(1). \end{aligned}$$
(5)

It follows from (3) and (5) that T is very close to being optimal as regards \(\Lambda _n\). Furthermore, (2) and (5) imply that the interpolation process based on the nodes T will converge as n tends to \(+\infty \) for functions of low regularity (e.g., for Hölder continuous functions) and will do so at a geometric rate for functions analytic on a neighborhood of the interval \([-1,1]\). In particular, Runge’s phenomenon cannot occur.

Although our method works with any set of interpolation points, we will use the same set of nodes in all the examples throughout the paper, namely,

$$\begin{aligned} U=\left\{ -\cos \left( \frac{i\pi }{n}\right) ,\, i=0,1,\ldots ,n\right\} , \end{aligned}$$
(6)

which are the zeros of the monic polynomial

$$\begin{aligned} (x^2-1) \frac{U_{n-1}(x)}{2^{n-1}}, \end{aligned}$$

where \(U_n\) is the n-th Chebyshev polynomial of the second kind defined by

$$\begin{aligned} U_n(\cos \theta )=\frac{\sin ((n+1)\theta )}{\sin \theta }. \end{aligned}$$

The points U are used extensively in numerical analysis as interpolation and quadrature nodes, see [1, Chapter 2] and [28, Definition 5.3.1]. They are closely related to the T points and share their good properties, with the added advantage for many purposes of including the extremes of the interval \([-1,1]\). Indeed, their Lebesgue constant is even smaller than the one of T since it was proven in [29] that

$$\begin{aligned} \Lambda _n(U)\le \Lambda _{n-1}(T),\quad n\in \mathbb {N}. \end{aligned}$$

As for the error estimate (4), note that

$$\begin{aligned} w_{n+1}(x)=\prod _{i=0}^{n}(x-x_{i})=(x^2-1)\frac{U_{n-1}(x)}{2^{n-1}} =\frac{T_{n+1}(x)-T_{n-1}(x)}{2^{n}}. \end{aligned}$$

Therefore, we have

$$\begin{aligned} \Vert w_{n+1}\Vert \le \frac{1}{2^{n-1}} \end{aligned}$$

for the set U, which implies again that U is a good selection of interpolation points.

Fig. 1
figure 1

Lagrange interpolation of the function \(f_1\) using points U with \(n=8\) in (a) and \(n=16\) in (b)

In the remainder of the paper, we will focus on interpolating functions with a steep derivative, such as the one plotted in Fig. 1, which is defined by

$$\begin{aligned} f_1(x)= \frac{2}{\pi }\text {arctg}(50(x-x_{0})),\quad x_{0}=0.28. \end{aligned}$$
(7)

Fig. 1 shows the Lagrange interpolation of the function \(f_1\) at the nodes U with \(n=8\) and \(n=16.\) Note the oscillations of the interpolating polynomial \(p_n\) due to the low number of interpolation points. This effect is what we call the Gibbs phenomenon, as mentioned in the introduction, and our work aims to find a method to reduce it as much as possible.

3 The algorithm

3.1 First steps

In general, the Lagrange polynomial at the Chebyshev nodes U constitutes a good approximation of the function f when f is smooth with mild derivatives. So, we seek a transformation \(g:\left[ -1,1\right] \longrightarrow \left[ -1,1\right] \) such that \(\hat{f}=g\circ f\) fulfils such conditions. Then we compute the Lagrange polynomial \(\hat{p}_{n}\) of \(\hat{f}\) and we expect that \(\hat{p}_{n}\) is oscillation-free if the derivative of \(\hat{f}\) is flat enough. Once \(\hat{p}_{n}\) is calculated, we can easily obtain an interpolating function of f by considering \(q_{n}=g^{-1}\circ \hat{p}_{n}\).

For the method to work, the transformation g must be smooth with an inverse \(g^{-1}\) analytically computable and also smooth. We propose a rational transformation g with a fast variation zone near \(y=1\), \(y=-1\), both or neither (see Fig. 5):

$$\begin{aligned} g(y)=a\frac{(y-z_{1})(y-z_{2})}{(x-z_{3})(y-z_{4})}=\frac{(ay-z_1^*)(y-z_{2})}{(y-z_{3})(y-z_{4})}, \end{aligned}$$
(8)

where \(z_{3}<-1\) and \(z_{4}>1\), and such that \(g(-1)=-1\) and \(g(1)=1\). We have used the obvious notation \(z_1^*=az_1.\) To implement the conditions at \(y=\pm 1\) in the function g, it will be more convenient to use the expression appearing on the far hand-right side of (8), see Proposition 3.1 below.

The transformation g depends on three parameters after imposing the two constraints at \(y=\pm 1\). Some authors have proposed other transformations that depend on fewer parameters [19]. However, the resulting transformations are less versatile.

To guarantee the existence of \(g^{-1}\), the function g must be a strictly increasing function on \(\left[ -1,1\right] \).

Proposition 3.1

Let \(g:\mathbb {R}\rightarrow \mathbb {R}\) be defined by (8) together with the restrictions assumed therein. If \(z_{2}\in (1/z_{3},1/z_{4})\), then g is strictly increasing on the interval \((z_{3},z_{4})\), and \(g^{-1}\) is explicitly computable.

Proof

Since \(g(1)=1\) and \(g(-1)=-1\), we can write

$$\begin{aligned} a-z^*_1=\frac{(1-z_3)(1-z_4)}{1-z_2}, \quad -a-z_1^*=\frac{(1+z_3)(1+z_4)}{1+z_2}. \end{aligned}$$

Subtracting both expressions, we arrive at

$$\begin{aligned} a =\frac{z_{2}(1+z_{3}z_{4})-(z_{3}+z_{4})}{1-z_{2}^{2}}, \end{aligned}$$
(9)

whereas adding them, we obtain

$$\begin{aligned} z_{1}^* =\frac{z_{2}(z_{3}+z_{4})-(1+z_{3}z_{4})}{1-z_2^2}. \end{aligned}$$
(10)

Since \(z_{2}\in (1/z_{3},1/z_{4})\subset (-1,1)\), formulas (9) y (10) are well defined. Let us consider the number

$$\begin{aligned} z^{*}=\frac{z_{3}+z_{4}}{1+z_{3}z_{4}}. \end{aligned}$$

As \(z_3<-1\), we have \(z_3^2>1\), which, in turn, implies \(1+z_3z_4<z_3(z_3+z_4)\), and thus \(1/z_3<z^*.\) Analogously, from \(z_4>1\) it follows that \(z^*<1/z_4.\) In particular, \(\vert z^{*}\vert <1\).

Now, we distinguish two possibilities. Suppose first that \(z_2=z^*\), which is equivalent to \( a=0\). Then \(-z_{1}^{*}=1+z_{3}z_{4}\) and, in turn, we have

$$\begin{aligned} g(y)=(1+z_{3}z_{4})\frac{(y-z^*)}{(y-z_{3})(y-z_{4})}. \end{aligned}$$

The equation \(g(y)=c\), with \(c\in \mathbb {R}\setminus \{0\}\), has at most two solutions, but one of them is attained outside the interval \([z_3,z_4]\), which implies that g is monotone on \((z_3,z_4)\) and, since \(g(-1)=-1\) and \(g(1)=1\), necessarily g is strictly increasing on \((z_{3},z_{4})\).

Next, if we set \(t=g(y),\, y\in (z_3,z_4),\) we can write \(y=g^{-1}(t),\, t\in \mathbb {R},\) to obtain

$$\begin{aligned} t(y-z_3)(y-z_4)=-z_1^*(y-z_2) \end{aligned}$$

or, equivalently,

$$\begin{aligned} t\left[ g^{-1}(t)\right] ^2+h_1(t) g^{-1}(t)-(z_1^*z_2-z_3 z_4 t)=0, \end{aligned}$$

where \(h_1(t)=z_1^*-(z_3+z_4)t\). Note that \(t=0\) corresponds to \(y=z_2\) and that \(z_1^*>0\). Therefore, we arrive at the formula

$$\begin{aligned} g^{-1}(t)=\frac{h_1(t)-\sqrt{h_1^2(t)+4t(z_1^*z_2-z_3 z_4 t)}}{-2t}. \end{aligned}$$
(11)

Suppose now that \(a\not =0.\) Then we may write

$$\begin{aligned} g(y)=a\frac{(y-z_{1})(y-z_{2})}{(y-z_{3})(y-z_{4})} \end{aligned}$$

with \(z_1=z_1^*/a.\) We can repeat the same arguments as before and, therefore, reach the same conclusion as long as \(z_1\not \in (z_3,z_4)\). Using (9) and (10), we have

$$\begin{aligned} z_1=\frac{z_{2}(z_{3}+z_{4})-(1+z_{3}z_{4})}{a(1-z_2^2)}=\frac{z_{2}(z_{3}+z_{4})-(1+z_{3}z_{4})}{z_{2}(1+z_{3}z_{4})-(z_{3}+z_{4})}=\frac{z^{*}z_{2}-1}{z_{2}-z^{*}}. \end{aligned}$$

As \(\vert z^{*}\vert <1\), \(z_{1}\) is an increasing function when considered as a function in the variable \(z_{2}\), with a vertical asymptote at \(z_2=z^*\). From the formula for \(z_1,\) it is easy to check that if \(z_{2}=1/z_{3}\), then \(z_{1}=z_{4}\) and if \(z_{2}=1/z_{4}\), then \(z_{1}=z_{3}\). Thus, if \(z_{2}\in (1/z_{3},z^{*})\), then \(z_{1}>z_{4}\) and if \(z_{2}\in (z^{*},1/z_{4})\), then \(z_{1}<z_{3}\). That is, \(z_1\not \in (z_3,z_4)\) if \(z_2\in (1/z_3,1/z_4)\), as we wanted to prove.

Finally, following analogous calculations as for (11), we obtain

$$\begin{aligned} g^{-1}(t)=\frac{h_2(t)-\sqrt{h_2^2(t)-4(a-t)(a z_1 z_2-z_3 z_4 t)}}{2(a-t)}, \end{aligned}$$
(12)

where \(h_2(t)=a(z_1+z_2)-(z_3+z_4)t.\) \(\square \)

Fig. 2
figure 2

Example of transformation g

Since the transformation g depends on five parameters and we have already imposed the conditions \( g(-1)=-1\) and \(g(1)=1\), we can fix three control points to determine g. Let \(-1<y_{l}<0<y_{r}<1\) with \(g(y_{l})=t_{l}\), \(g(y_{r})=t_{r}\), and \(g(0)=t_{c}\). Obviously, \(t_{l}<t_{c}<t_{r}\) if g is a strictly increasing function on \([-1,1]\); see Fig. 2. We now discuss how to choose these control points.

Let us recall that in our context the function f to be interpolated is supposed to have a steep derivative. So we may assume that, in the array of data \(f_{i},\, 0\le i\le n,\) there is a gap in the distribution over the image \([-1,1]\), i.e., these data are mainly concentrated in the intervals \(\left[ -1,-1+\varepsilon _{1}\right] \) and \(\left[ 1-\varepsilon _{2},1\right] \) for some relatively small \(\varepsilon _1, \varepsilon _2>0\). To identify these intervals, we first sort the values \(\left\{ f_{i},\, 0\le i\le n\right\} \). Let \(K_{n}=\left\{ i \in \mathbb Z: \, 0\le i\le n\right\} \) and let the permutation \(s:K_{n}\rightarrow K_{n}\) be such that \(f_{s(i)}\le f_{s(i+1)}\) for all \(0\le i<n\). Then, we compute \(\delta _i=f_{s(i+1)}-f_{s(i)}\) for all \(0\le i<n\), and choose \(I\in K_n\) such that

$$\begin{aligned} \delta _I=\max _{0\le i<n}\delta _i. \end{aligned}$$
(13)

So, we split the values \(\left\{ f_{i},\, 0\le i\le n\right\} \) into two data sets: \(\left\{ f_{i},\, 0\le i\le I\right\} \) and \(\left\{ f_{i},\, I< i\le n\right\} \) that lie in \(\left[ -1,-1+\varepsilon _{1}\right] \) and \(\left[ 1-\varepsilon _{2},1\right] \), respectively, for some \(\varepsilon _{1},\varepsilon _{2}>0\). Now, we take

$$\begin{aligned} y_{l}&=f_{s(I)},&y_{r}&= f_{s(I+1)},&y_c&=0, \\ t_{l}&=\frac{2I}{n}-1,&t_{r}&=\frac{2(I+1)}{n}-1,&t_{c}&=\frac{1}{2}(y_{l}+y_{r}). \end{aligned}$$

These choices aim to make the values \(g(f_i),\, 0\le i\le n,\) as evenly distributed as possible over the interval \([-1,1]\). From these numbers we analytically calculate the parameters that determine g.

Fig. 3
figure 3

Interpolation for \(f_{1}\): (a) transformation g, (b) functions \(\hat{f}\) and \(\hat{p}_{n}\), (c) functions \(f_{1}\), \(q_{n}\), and classical Lagrange polynomial \(p_{n}\)

Of course, the whole procedure relies on the fact that the region where the function’s derivative is steep can be detected correctly by (13), which is not always the case. Here are two particular functions showing a different behavior in this respect.

Figure 3 displays the interpolation results obtained for the function \(f_1\), see (7), when using \(n=8\), that is, 9 interpolation points. Figure 3 also shows the transformation g and the function \(\hat{f}=g\circ f\). In this example, \(\hat{f}\) results in a smooth function with a mild derivative, so the interpolation \(\hat{p}_{n}\) does not exhibit any oscillation, and thus the interpolation \(q_{n}=g^{-1}\circ \hat{p}_{n}\) does not exhibit any oscillation either. For comparison, we also include the classical Lagrange polynomial \(p_{n}\) that exhibits large oscillations. As mentioned in Sect. 2, we have used the Chebyshev interpolation points (6) for the computation of both \(\hat{p}_{n}\) and \(q_{n}\).

In Fig. 4, we want to get some insight into the convergence properties of this kind of interpolation process as the number n increases. We compute the interpolation of two different functions for \(n=4,8, 16\). For the function \(f_{1}\) in (7), the interpolation error decreases as n increases. Even for a small number of nodes, the interpolation function \(q_{n}\) does not exhibit significant oscillations. However, for the function

$$\begin{aligned} f_{2}(x)=\tanh \left( 11 x-\sqrt{100 x^{2}+1}\right) , \end{aligned}$$

some oscillations appear when n increases. Not only does the function \(f_2\) have a very sharp change in the derivative, but the lack of symmetry of the function at the extremes of the interval \([-1,1] \) causes formula (13) to be unable to detect the region where the derivative of \(f_2 \) is largest. As a result, the function \(\hat{f}_2\) does not have a sufficiently mild derivative. Still, we would like to point out that the image of the interpolant \(q_{n}\) remains close to the interval \(\left[ -1,1\right] \).

Fig. 4
figure 4

Interpolation results for \(f_{1}\) (first row) and \(f_{2}\) (second row)

In the next section, we refine our interpolation procedure to solve situations as those presented here by the function \(f_{2}\).

3.2 The elaborated version

It is not difficult to find some other functions with the same behavior as \(f_2\) concerning our interpolation process. These cases are essentially characterized by the fact that the function g is not correctly chosen because the data \(\left\{ f_{i},\,0\le i\le n\right\} \) cannot be split into two subsets, one with values close to \(-1\) and the other with values close to 1. For instance, if \(f_{s(i)}=1\) for all \(i>I\) (or \(f_{s(i)}=-1\) for all \(i\le I\)), or if \(I=0\) (or if \(I=n-1\)), the interpolation will produce oscillations.

However, finding a transformation g that serves better is still possible. The idea is to build g such that \(\hat{f}=g \circ f\) is—except for a slight change of scale—as close to the identity function \(id(x)=x\) as possible, which will be carried out by minimizing an adequately chosen functional. Of course, this goal meets obvious limitations when f is not a monotone function since the very transformation g must be monotone. Even if this objective cannot be always met, it will be helpful as a guide for designing the procedure. The algorithm runs as follows.

  1. 1.

    Define a transformation \(g:\left[ -1,1\right] \rightarrow \left[ -1,1\right] \) by means of (8) with the restrictions \(g(-1)=-1\) and \(g(1)=1\), which can be done analytically taking a and \(z_1^*\) as in formulas (9) and (10). In the next steps we want to fix the values \(z_{2}\), \(z_{3}\), and \(z_{4}\). Since the transformation g depends only on \(z_{2}\), \(z_{3}\), and \(z_{4}\), it will henceforth be denoted by \(g_{\varvec{z}}\), where \(\varvec{z}=(z_{2},z_{3},z_{4})\).

  2. 2.

    Define the numbers

    $$ \theta _{i}=-1+2\frac{f_{i}-m}{M-m},\quad 0\le i\le n, $$

    where

    $$\begin{aligned} M={\displaystyle \max _{0\le i\le n}}f_{i},\qquad m={\displaystyle \min _{0\le i\le n}}f_{i}. \end{aligned}$$
    (14)

    It is clear that

    $$\begin{aligned} \max _{0\le i\le n}\theta _{i}=1,\qquad {\displaystyle \min _{0\le i\le n}}\theta _{i}=-1. \end{aligned}$$
  3. 3.

    Let \(K_{n}=\left\{ i\in \mathbb {Z}:\, 0\le i\le n\right\} \) and let the permutation \(s:K_{n}\rightarrow K_{n}\) be such that \(\theta _{s(i)}\le \theta _{s(i+1)}\) for all \(0\le i<n\). Thus, \(\theta _{s(0)}=-1\) and \(\theta _{s(n)}=1\). The purpose of the steps 2 and 3 is to normalize and order, respectively, the values \(f_i,\, 0\le i\le n.\)

  4. 4.

    To ensure that \(z_{3}<-1\), \(z_{4}>1\), and \(z_{2}\in (1/z_{3},1/z_{4})\), we write

    $$\begin{aligned} \left\{ \begin{aligned} z_{3}&=-a_{1}+a_{2}\tanh \beta _{1},\\ z_{4}&=a_{1}+a_{2}\tanh \beta _{2},\\ z_{2}&=\frac{1}{2}\left( \frac{1}{z_{3}}+\frac{1}{z_{4}}\right) +\frac{\gamma }{2}\left( \frac{1}{z_{4}}-\frac{1}{z_{3}}\right) \tanh \beta _{3}. \end{aligned}\right. \end{aligned}$$

    Obviously, it holds that

    $$\begin{aligned} z_{3}\in \left( -a_{1}-a_{2},-a_{1}+a_{2}\right) ,\quad z_{4}\in \left( a_{1}-a_{2},a_{1}+a_{2}\right) . \end{aligned}$$

    Typically, we choose \(a_1>1\) and a tolerance \(\varepsilon >0,\) with \( \varepsilon \ll 1\), and define \(a_{2}=a_{1}-1-\varepsilon >0\) and \(\gamma =1-\varepsilon \), so that

    $$\begin{aligned} z_{2}\in \left( \frac{1}{z_{3}}+\varepsilon l,\frac{1}{z_{4}}-\varepsilon l\right) , \end{aligned}$$

    with

    $$\begin{aligned} l=\frac{1}{2}\left( \frac{1}{z_{4}}-\frac{1}{z_{3}}\right) ,\qquad z_{3}<-1-\varepsilon ,\qquad z_{4}>1+\varepsilon . \end{aligned}$$

    Note that when \(z_3\) and \(z_4\) are both large, then \(g_{\varvec{z}}(x)\sim id(x)\) for \(\left| x\right| <1\). Thus, the value of \(a_1\) is chosen sufficiently large so the transformation g becomes the identity function at the far ends of the intervals where \(z_3\) and \(z_4\) live, whence no spurious constraints are introduced in the procedure.

  5. 5.

    Define the functional \(F:\mathbb {R}^{3}\rightarrow \mathbb {R}\) by

    $$ F(\beta _{1},\beta _{2},\beta _{3})=\sum _{i=0}^{n} \omega _i\left( g_{\varvec{z}}(\theta _{s(i)})-x_{i}\right) ^{2}, $$

    where \(\varvec{z}=(z_{2},z_{3},z_{4})\) is given by step 4, and \(\omega _i\ge 0, \, i=0,1,\ldots ,n,\) are the weights associated with the functional F, whose form we will specify later. Notice that the functional F intends to make \(\hat{f}=g \circ f\) as close to id as possible on the interpolation points \(x_i,\, i=0,\ldots ,n\).

  6. 6.

    Compute a numerical minimum of F. The functional F might have several local minima, for which we will take various initial points in the minimization procedure, see Remark 3.2 below.

  7. 7.

    If \(\varvec{\hat{\beta }}=(\hat{\beta }_{1},\hat{\beta }_{2},\hat{\beta }_{3})\) is the local minimum of F calculated in step 6, then compute \(\varvec{\hat{z}}=(\hat{z}_{1},\hat{z}_{2},\hat{z}_{3})\) and \(g_{\varvec{\hat{z}}}\) through the formulas introduced in step 4.

  8. 8.

    Compute \(\hat{f}_{i}=g_{\varvec{\hat{z}}}(\theta _{i})\) for all \(i=0,1,\dots ,n\).

  9. 9.

    Compute the Lagrange polynomial \(\hat{p}_{n}\) such that \(\hat{p}_{n}(x_{i})=\hat{f}_{i}\) for all \(i=0,1,\dots ,n\).

  10. 10.

    Compute \(Q_{n}=g_{\varvec{\hat{z}}}^{-1}\circ \hat{p}_{n}\) using (11) or (12).

  11. 11.

    Compute \(q_{n}=m+(M-m)(Q_{n}+1)/2\).

Several remarks are in order.

Remark 3.1

If \(\left| f_{i}\right| =1\) for all \(i=0,1,\dots ,n\), then \(\left| g(\theta _{i})\right| =\left| g(f_{i})\right| =1\) for all \(i=0,1,\dots ,n\). In this case, the functional F is constant and no minimization is posssible. Since the idea of the method is that the values \(g(\theta _{s(i)})\) remains close to \(x_{i}\) for all \(i=0,\ldots , n\), we define

$$ \theta _{i}^{*}=\frac{1}{1+\delta }(\theta _{i}+\delta \bar{\theta }_{i}), $$

where \(\delta >0\), and

$$\begin{aligned} \bar{\theta }_{i}=\theta _{0}\frac{1-x_{i}}{2}+\theta _{n}\frac{1+x_{i}}{2}. \end{aligned}$$

Then we minimize F using \(\left\{ \theta _{i}^{*},\, 0\le i\le n\right\} \) instead of \(\left\{ \theta _{i},\, 0\le i\le n\right\} \). The polynomial \(Q_n\) obtained in step 10 verifies \(Q_{n}(x_{i})=\theta _{i}^{*},\, i=0,\ldots ,n\), so we change the notation, writing \(Q_{n}^{*}=g_{\varvec{\hat{z}}}^{-1}\circ \hat{p}_{n}\), and we modify step 10 adding the operation

$$ Q_{n}=(1+\delta )Q_{n}^{*}-\delta \bar{\theta }, $$

where

$$\begin{aligned} \bar{\theta }(x)=\theta _{0}\frac{1-x}{2}+\theta _{n}\frac{1+x}{2}. \end{aligned}$$

This new definition of \(Q_{n}\) ensures that \(Q_{n}(x_{i})=\theta _{i}\) and thus \(q_{n}(x_{i})=f_{i}\) for all \(i=0,1,\dots ,n\). The results displayed in the rest of the section are computed using \(\delta =10^{-3}\). Though the numerical examples show that the value of \(\delta \) is not significant, when \(\delta \ll 1\) (about \(10^{-5}\)) the minimum of F cannot be properly calculated, whereas large values of \(\delta \) (about \(10^{-1}\)) produce somewhat larger oscillations in the interpolation process.

Fig. 5
figure 5

The transformation \(g_{\varvec{z}}\) for various values of \(\varvec{\beta _0}\): (a) \((-2,2,0)\), (b) \((2,-2,0)\), (c) \((2,-2,-2)\), (d) \((2,-2,2)\)

Remark 3.2

In the following examples, we use the Nelder–Mead simplex method [30] to minimize the functional F defined in step 5 of our algorithm. Since the method could converge to a local minimum or even not converge at all, we apply it with various initial points \(\varvec{\beta _0}=(\beta _1,\beta _2,\beta _3)\). Specifically, we run the algorithm with the initial points \((-2,2,0)\), \((2,-2,0)\), \((2,-2,-2)\) and \((2,-2,2)\). These four points give rise to four qualitatively distinct transformations \(g_{\varvec{z}}\), covering most possible situations, see Fig. 5. Thus, in the first case, \(g_{\varvec{z}}\) is close to id, the second one might be our initial guess if the values \(\left\{ f_{i},\, 0\le i\le n\right\} \) are all close to \(\pm 1\), whereas the last two might be the initial transformations in the case where most of the values \(\left\{ f_{i},\, 0\le i\le n\right\} \) are close to either 1 or \(-1\), but not to both.

Remark 3.3

The interpolation method introduced here is useful for a wide range of functions, regardless of whether the function’s derivative is steep. However, if the dataset \(\left\{ \theta _{i},\, 0\le i\le n\right\} \) verifies \(\left| \theta _{i}\right| >1-\tau ,\, i=0,\ldots ,n,\) for some \(\tau >0\), with \(\tau \ll 1\), then the method developed in Sect. 3.1 may be sufficient to successfully interpolate the function. Recall that the values \(\left\{ \theta _{i},\, 0\le i\le n\right\} \) are the values \(\left\{ f_{i},\, 0\le i\le n\right\} \) normalized and ordered. We would also like to point out that the simpler variant is faster since it does not require minimizing the functional F. Thus, in all the numerical examples, we adopt the following procedure:

  1. (i)

    If \(\left| \theta _{i}\right| >1-\tau \) for all \(i=0,1,\dots ,n\), with \(\tau =10^{-2}\), then we use the simpler interpolation method developed in Sect. 3.1.

  2. (ii)

    Otherwise, we use the more elaborated method developed in this section.

Now, we provide numerical results concerning the elaborated version of the algorithm. The selected parameters are \(a_1=5\), \(\varepsilon =10^{-4}\), and \(w_i=(1.01-x_i^2)^{-1}\) for \(i=0,1,\dots ,n\) for the following reasons. On the one hand, as \(z_3\in (-2a_1+1+\varepsilon ,-1-\varepsilon )\) and \(z_4\in (1+\varepsilon ,2a_1-1-\varepsilon )\), the choice for \(a_1\) should be large enough so that \(z_3\) and \(z_4\) can be large. Note that \(g_{\varvec{z}}\rightarrow id\) when \(\vert z_3\vert , z_4\rightarrow +\infty \). On the other hand, \(\varepsilon \) measures the boundary layer of \(g_{\varvec{z}}\) when \(z_3 \rightarrow -1\) or \(z_4 \rightarrow 1\). The smaller \(\varepsilon \), the narrower the boundary layer; therefore, the interpolation could have a steeper gradient. Finally, the weights \(w_i\) are larger near \(\pm 1\) since spurious oscillations appear more frequently near the boundary.

Figure 6 displays the interpolation results obtained with this method for \(f_{1}\), see Fig. 3 for comparison. We can see that the interpolant \(q_{n}\) performs better than with the previous method since, in this case, the function \(\hat{f}_{1}\) is very close to id.

Fig. 6
figure 6

Interpolation results as in Fig. 3 using the more elaborated variant of the algorithm

Fig. 7
figure 7

Interpolation results as in Fig. 4 using the more elaborated variant of the algorithm

In Fig. 7 we show the new interpolation results for \(f_{1}\) and \(f_{2}\) for \(n=4, 8\), and 16. In this case, we see that for \(f_{2}\) the oscillations are gone compared to the previous method, see Fig. 4.

Fig. 8
figure 8

Interpolation results for \(f_{3}\) (first row), \(f_{4}\) (second row), \(f_{5}\) (third row), \(f_{6}\) (fourth row), and \(f_{7}\) (fifth row)

Now, let us consider some additional numerical examples. Define the functions

$$\begin{aligned} {\begin{matrix} f_{3}(x)&{}=-1+2\cos ^{1/10}\left( \frac{\pi }{4}(x-1)\right) ,\quad f_{4}(x)=\frac{1}{1+25(x-x_{0})^{2}},\\ f_{5}(x)&{}=\left| x-x_{0}\right| ,\quad f_{6}(x)=\cos \left( \frac{\pi }{2}(x-x_{0})\right) ,\quad f_{7}(x)=-H(x-x_{0}), \end{matrix}} \end{aligned}$$

where H is the Heaviside function and \(x_{0}=0.28\) in all cases except for \(f_6\), in which case is \(x_0=0.21\).

The performance of the method for these functions is displayed in Fig. 8. We can see that only for some cases corresponding to \(n=4\) do the interpolants (both \(p_{n}\) and \(q_{n}\)) highly deviate from the exact solution, whereas for \(n=8\) and \(n=16\), the interpolant \(q_{n}\) does not differ from the exact solution in any of the examples shown. However, the Lagrange polynomial \(p_n\) still exhibits oscillations in some cases. Note that the interpolant \(q_n\) performs well even for those cases for which the algorithm was not intended: more regular functions without a boundary layer.

In the last set of numerical examples in this section, we compare the performance of our method with the well-established modified Akima method. We have used the command makima from MATLAB\(^{\small \circledR }\) to compute it.

Figure 9 displays the interpolation results with the functions we have defined so far, except for \(f_6\), for which no differences are observed between the two methods. The piecewise cubic Hermite interpolant corresponding to the Akima method is denoted by \(s_n\), while, as usual, \(q_n\) stands for the interpolant resulting from our method. The numerical experiments show that \(q_n\) is either equivalent to \(s_n\) or behaves better, especially for those functions with a jump or a steep derivative.

Fig. 9
figure 9

Comparison with the modified Akima method (interpolant \(s_n\)). Results for \(f_1\) (a), \(f_2\) (b), \(f_3\) (c), \(f_4\) (d), \(f_5\) (e), and \(f_7\) (f). In all cases \(n=8.\)

3.3 Monotonicity and maxima

One of the main concerns in interpolations is monotonicity. The method developed in the present work is not monotone with respect to the interpolation data \(\left\{ f_{i},\,0\le i\le n\right\} \). However, monotonic interpolation is not always required or desired. For example, in finite element applications, monotonic interpolation may be helpful in some specific applications, but numerical solutions often suffer from numerical diffusion, as we can appreciate in Fig. 9. Note the mild derivative of the modified Akima approximant, especially in cases (a), (b), and (f), as well as the decrease of the maximum in case (d).

Although not monotonic, we want to show that our method behaves well with respect to the maximum of the function to be interpolated. In Table 1 we show the normalized maximum of the functions appearing in the previous examples and the corresponding normalized maximum of the interpolants \(p_{n}\) and \(q_{n}\). These maxima are computed using the formula

$$ \tilde{f}(x)=-1+\frac{2}{M-m}(f(x)-m) $$

and then evaluating \(\tilde{f}\) at the nodes \(\tilde{x}_{i}=-1+\tilde{h}i,\, 0\le i\le 2000\), with \(\tilde{h}=10^{-3}\) and Mm defined as in (14). The analogous procedure applies to \(p_{n}\) and \(q_{n}\). In Table 1, we use the notation

$$ M_{f}=\max _{0\le i\le 2000}\left| \tilde{f}(\tilde{x}_{i})\right| , $$

which serves analogously for \(p_{n}\) and \(q_{n}\).

Table 1 Maximum of the normalized functions f, \(p_{n}\) and \(q_{n}\) for the functions \(f_{i},\, i=1,\ldots , 7,\) and \(n=4, 8, 16\)

It is noteworthy that the maximum of the interpolant \(q_{n}\) is taken at 1 with an error less than \(10^{-4}\) in all functions where \(M_{f}=1\). Moreover, when \(M_{f}>1\), the maximum of \(f_4\) is better approximated by \(q_{n}\) than by \(p_{n}\), whereas for \(f_{5}\) and \(f_{6}\), the error is comparable.

3.4 Convergence

Next, we perform a brief convergence study to gain some insight into the error committed by our interpolation method. Specifically, we want to illuminate the role played by the boundary layers of g in the absence of the interpolant’s oscillations.

We keep the notations used throughout the paper. Recall that \(\Vert f\Vert \) stands for the sup norm of the function f on the interval \([-1,1].\)

As \(\hat{p}_{n}\) is the Lagrange interpolating polynomial of \(\hat{f}=g\circ f\), following (2), we obtain

$$ \left| \hat{f}(x)-\hat{p}_{n}(x)\right| \le \Big (1+\Lambda _n(U)\Big ) E_n[\hat{f}] $$

for all \(x\in \left[ -1,1\right] \). Furthermore, since g is a strictly monotone function and posseses an inverse function \(g^{-1}\), we can define

$$\begin{aligned} L(x)=\sup _{y\in \hat{I}_{x}}\left| (g^{-1})'(y)\right| =\sup _{y\in I_{x}}\frac{1}{\left| g'(y)\right| }, \end{aligned}$$
(15)

where \(\hat{I}_{x}\) is the smallest closed interval containing both \(\hat{f}(x)\) and \(\hat{p}_{n}(x),\) and \(I_{x}\) is the smallest closed interval containing both f(x) and \(q_{n}(x)\). It is clear that L(x) might be arbitrarily large, even \(+\infty .\) Using (15), we arrive at the formula

$$\begin{aligned} {\begin{matrix} \left| f(x)-q_{n}(x)\right| &{} =\left| g^{-1}\circ \hat{f}(x)-g^{-1}\circ \hat{p}_{n}(x)\right| \le L(x)\left| \hat{f}(x)-\hat{p}_{n}(x)\right| \\ &{} \le L(x)\Big (1+\Lambda _n(U)\Big ) E_n[\hat{f}] \end{matrix}} \end{aligned}$$
(16)

for all \(x\in \left[ -1,1\right] \). If the transformation g is close to id, then L is close to 1, and we recover the classical interpolation error bound. This is the reason why the method also works well in the case of regular functions without a boundary layer. However, if g’s shape is similar to that shown in Fig. 3(a) and, for some \(x\in \left[ -1,1\right] \), f(x) and \(q_{n}(x)\) lie close to 1 or \(-1\), then L(x) becomes very small. Note that the narrower the boundary layer of g, the smaller L(x). So, in those regions, the error bound (16) is much smaller than the one in classical interpolation, and this kills the Gibbs phenomenon. Additionally, since the function \(\hat{f}\) is obtained as a regularization of f,  the error of best polynomial approximation of \(\hat{f}\) is not expected to be larger than that of f.

If \(f\in C^{n+1}([-1,1])\), we could have reasoned along the same lines with the error estimate (4) to obtain

$$\begin{aligned} \left| f(x)-q_{n}(x)\right| \le L(x) \,\frac{\Vert \hat{f}^{(n+1)}\Vert }{2^{n-1}(n+1)!} \end{aligned}$$

for all \(x\in \left[ -1,1\right] \). The same considerations as for (16) then apply.

4 Extension to higher dimensions

One of the main advantages of our interpolation method is that its extension to higher dimensions is straightforward. In higher dimensions, as we will interpolate over quadrilaterals, the number N of elements of the data set \(\left\{ f_{i},\, 1\le i\le N\right\} \), where now \(N=(n+1)^{d}\) (d represents the dimension of the space and n the degree of the interpolating polynomial), is much larger, which eases the minimization process being, in practice, somewhat slower, but, generally, more accurate.

The only modification necessary to extend the algorithm developed in Sect. 3.2 concerns the definition of the functional F in step 5. The data set elements are now values of the function f taken on a multidimensional grid, while the transformation g depends on a single variable. Thus, there is no direct analogy to the construction of F made earlier in Sect. 3.2, and we have chosen to minimize the functional F using equispaced nodes. Several tests suggest that using other possible types of nodes—Chebyshev nodes, for example—does not improve the results. Therefore, in the multidimensional case, we define F as

$$ F(\beta _{1},\beta _{2},\beta _{3})=\sum _{i=1}^{N}\omega _i\left( g_{\varvec{z}}(\theta _{s(i)} )-x'_{i}\right) ^{2}, $$

where now \(x_{i}'=-1+h'(i-1),\, i=1,2,\dots ,N\), with \(h'=2/(N-1)\). Note that we use the Chebyshev nodes (6) in each variable for the computation of the interpolant \(\hat{p}_{n}\), as in the rest of the paper.

To be more precise, the calculation of \(\hat{p}_n\) is performed using the standard procedure in the finite element framework over quadrilaterals; see [31, p. 55] for instance. Namely, given an arbitrary function \(f:[-1,1]\times [-1,1]\,\longrightarrow \,\mathbb R\) and the set of nodes

$$ U\times U=\left\{ (x_i,y_j), \, x_i,y_j\in U,\,i,j=0,1,\dots ,n\right\} , $$

the interpolant \(p_n\) of f is defined as

$$ p_n(x,y)=\sum _{i,j=0}^{n}f_{ij}l_i(x)l_j(y), $$

where \(l_i,\, i=0,1,\ldots , n,\) are the Lagrange fundamental polynomials (1) and \(f_{ij}=f(x_i,y_j).\)

Note that the polynomial \(p_n\in \mathcal {Q}_n\), where \(\mathcal {Q}_n\) denotes the space of all polynomials of degree less than or equal to n in each variable, i.e.,

$$ \mathcal {Q}_n=\left\{ p(x,y)=\sum _{\alpha _i\le n,\,i=1,2}\gamma _{\alpha _1 \alpha _2}x^{\alpha _1} y^{\alpha _2}\,:\,\gamma _{\alpha _1 \alpha _2}\in \mathbb {R}\right\} . $$

From this definition, it is straightforward to prove that \(\dim \mathcal {Q}_n=(n+1)^2\) and \(\mathcal {P}_n\subset \mathcal {Q}_n\subset \mathcal {P}_{2n}\) hold for the two-dimensional case.

Also, Remark 3.1 must be modified, though, again, in a straightforward way. For instance, if \(d=2\) and \(\left\{ (x_{i},y_{i}),\, 1\le i\le N\right\} \) is the set of interpolation nodes in \(\left[ -1,1\right] \times \left[ -1,1\right] \), then the function \(\bar{\theta }\) is defined as follows. First, we consider the functions

$$\begin{aligned} \bar{\theta }_{i}= & {} \frac{\theta _{j_{1}}}{4}(1-x_{i})(1-y_{i})+\frac{\theta _{j_{2}}}{4}(1-x_{i})(1+y_{i})\\+ & {} \frac{\theta _{j_{3}}}{4}(1+x_{i})(1+y_{i})+\frac{\theta _{j_{4}}}{4}(1+x_{i})(1-y_{i}) \end{aligned}$$

for all \(i=1,2,\dots ,N\), where

$$\begin{aligned} (x_{j_{1}},y_{j_{1}})&=(-1,-1),&(x_{j_{2}},y_{j_{2}})&=(1,-1), \\ (x_{j_{3}}, y_{j_{3}})&=(1,1),&(x_{j_{4}},y_{j_{4}})&=(-1,1), \end{aligned}$$

and then we define \(\bar{\theta }\) through the formula

$$\begin{aligned} \begin{aligned} \bar{\theta }(x,y)&=\frac{\theta _{j_{1}}}{4}(1-x)(1-y)+\frac{\theta _{j_{2}}}{4}(1-x)(1+y)\\&+\frac{\theta _{j_{3}}}{4}(1+x)(1+y)+\frac{\theta _{j_{4}}}{4}(1+x)(1-y). \end{aligned} \end{aligned}$$
Fig. 10
figure 10

Results for \(f_{1}\) and \(n=8\): (a) transformation g, (b) polynomial \(\hat{p}_{n}\), (c) function \(f_{1}\), and (d) interpolant \(q_{n}\). The interpolation points \(U\times U\) are displayed in (b), (c), and (d)

To finish, let us see some numerical examples. First, we consider the function with a steep gradient

$$\begin{aligned} f_{1}(x,y)=\frac{2}{\pi }\arctan \left( d_{1}(x,y)\right) , \end{aligned}$$

where \(d_{1}(x,y)=50 ((x+1)^{2}+(y+1)^{2}-r^{2})\) and \(r=1.1\).

Figure 10 displays the results of our interpolation method for \(f_{1}\). Notice the similarity of the transformation g to the ones appearing in the one-dimensional case, see Figs. 3 and 6. We also show the Lagrange polynomial \(\hat{p}_{n}\) and the interpolant \(q_{n}\). Again, the interpolation does not exhibit any significant oscillation.

Additionally, let us consider the regular function

$$\begin{aligned} f_{2}(x,y)=y\left[ 2\sin \left( \frac{\pi }{4}(x+1)\right) -1\right] \end{aligned}$$

and the jump function

$$\begin{aligned} f_{3}(x,y)=H\left( y-\left( 0.11+\frac{1}{2}\sin (\pi x)\right) \right) , \end{aligned}$$

where H is the Heaviside function. Finally, take

$$\begin{aligned} f_{4}(x,y)=\frac{2}{1+25\left( (x-x_{0})^{2}+y^{2}\right) }-1,\quad x_{0}=0.28, \end{aligned}$$

and the function with a flat region

$$\begin{aligned} f_{5}(x,y)=\tanh \left( 11d_{2}(x,y)-\sqrt{1+81d_{2}(x,y)^{2}}\right) , \end{aligned}$$

where \(d_{2}(x,y)=\sqrt{(x+1)^{2}+(y+1)^{2}}-1.28\).

Figure 11 shows the interpolation results of all these functions compared to the Lagrangian interpolation. In all cases, the interpolant \(q_{n}\) performs better than its Lagrangian counterpart \(p_n\), and its oscillations are not noticeable.

Fig. 11
figure 11

Interpolation points and results for \(f_{i}\) (i-th row), \(i=1,2,3,4,5\): (a) the function f, (b) the polynomial \(p_{n}\), and (c) the interpolant \(q_{n}\). In all cases \(n=8\)

Moreover, in Table 2, we quantify the extrema, as we did in the one-dimensional case, and, as there, the oscillations are still less than \(10^{-4}\), and, in contrast to \(p_n\), \(q_n\) captures the extrema well.

5 Application in the finite element framework

This last section aims to illustrate through a meaningful example how our interpolation method is used within the finite element framework.

Let us consider a one-dimensional version of the classical convection-diffusion equation, that is,

$$\begin{aligned} {\left\{ \begin{array}{ll} u_{t}(x,t)+ \lambda u_{x}(x,t)-\nu u_{xx}(x,t)=0, &{} (x,t)\in \Omega \times (0,T); \\ u_{x}(x,t)=0,&{} (x,t)\in \partial \Omega \times (0,T); \\ u(x,0)=u_{0}(x), &{} x\in \Omega ; \end{array}\right. } \end{aligned}$$
(17)

where \(\Omega =(a,b)\subset \mathbb {R}\), \(\lambda \in \mathbb {R}\), \(\nu >0,\) and \(u_{0}\in L^{2}(\Omega )\).

This problem, with a small diffusion coefficient and a non-smooth initial condition, contains all the ingredients to make numerical schemes have a hard time and, at the same time, is simple enough to be appreciated by the lay reader. The extension to higher dimensions encounters many of the difficulties appearing here, and a more exhaustive study in that setting will be conducted in a future publication. Here we intend to gain insight into how the new interpolation method, in conjunction with the finite element method, behaves in convection-diffusion problems.

The numerical scheme we propose is a semi-Lagrangian finite element method. These schemes combine the finite element method for space discretization with an Eulerian-Lagrangian approach. For further details than what we give here, see [32,33,34,35].

Table 2 Maximum of the normalized functions f, \(p_{n}\) and \(q_{n}\) for \(f_i,\, i=1,\ldots ,5,\) and \(n=4, 8,16\)

We first introduce some generalities concerning the finite element method. Let \(D_h=\left\{ I_{i}\right\} _{1\le i\le N_{e}}\) be a partition of \(\bar{\Omega }\), where \(I_{i}=[x_{i-1}^{e},x_{i}^{e}],\, i=1,\ldots ,N_e,\) are called the elements of the partition \(D_h\), with \(a=x_{0}^{e}<x_{1}^{e}<\dots <x_{N_{e}}^{e}=b\), and \(N_e\) is the number of elements of \(D_h\). The subindex h relates to the size of the subintervals \(I_i\); that is, \(h=\max \{x_{i}^{e}-x_{i-1}^{e}, i=1,\ldots ,N_e\}\). It will not be relevant in what follows, but it is a standard notation in the finite element method, and we would rather stick to it.

We define the finite element space associated with the partition \(D_h\) by

$$ V_h=\left\{ v_h\in C(\bar{\Omega })\,:\,{v_h}_{\mid _{I_{i}}}\in \mathcal {P}_m,\,1\le i\le N_{e}\right\} , $$

where \(\mathcal {P}_m\) is the space of polynomials of degree less than or equal to m.

Associated with the space \(V_h\), we fix an extended set of nodes \(\left\{ x_{i}\right\} _{1\le i\le N}\), with \(a=x_{1}<x_{2}<\dots <x_{N}=b\), where \(m+1\) nodes are located in every element of \(D_{h}\). The nodes \(\left\{ x_{i}\right\} _{1\le i\le N}\) contain the points \(\{ x^e_{i}\} _{1\le i\le N_{e}}\), and thus it holds that \(N=mN_\varepsilon +1 \). These nodes usually correspond with the points defined in (6) after an affine transformation of \(\left[ -1,1\right] \) onto each element of \(D_{h}\).

Let \(\left\{ \phi _{i}\right\} _{1\le i\le N}\) be a nodal basis for the space \(V_{h}\). Then any function \(v_{h}\in V_{h}\) can be written as

$$ v_{h}(x)=\sum _{i=1}^{N}v_{i}\phi _{i}(x), $$

where \(\phi _{i}(x_{j})=\delta _{ij},\,1\le i,j\le N\).

Regarding the time discretization, let us consider a uniform partition \(\left\{ t_{n}\right\} _{0\le n\le N_{T}}\) of the interval (0, T), and denote \(t_{n+1}-t_{n}\) by \(\Delta t\). For each \(x\in \Omega \) and each \(t_{n+1}\), we introduce the so-called characteristic curves defined by

$$ {\left\{ \begin{array}{ll} \displaystyle {\frac{\partial X}{\partial t}}(x,t_{n+1};t)=\lambda ,\\[3mm] X(x,t_{n+1};t_{n+1})=x. \end{array}\right. } $$

In this case, the computation of X is straightforward since \(\lambda \) is constant, and we obtain

$$\begin{aligned} X(x,t_{n+1};t)=\lambda (t-t_{n+1})+x. \end{aligned}$$

Specifically, \(X(x,t_{n+1};t)\) corresponds to the position at time \(t\le t_{n+1}\) that will reach the point x at time \(t_{n+1}.\) In particular, \(X(x,t_{n+1};t_n)\) is the value of the characteristic curve at time \(t_n\) that passes through x at time \(t_{n+1}\), and we refer to it as the “foot” of the characteristic curve. In this respect, we will use the notation

$$\begin{aligned} {\begin{matrix} u^{n}(x) &{} =u(x,t_{n}),\\[1mm] u^{n*}(x) &{} =u(X(x,t_{n+1};t_{n}),t_{n})=u^{n}(X(x,t_{n+1};t_{n})). \end{matrix}} \end{aligned}$$

Thus, \(u^n(x)\) is the value of u at the point x at time \(t_n\) and \(u^{n*}(x)\) denotes the value of u at time \(t_n\) at the “foot” of the characteristic curve that passes through x at time \(t_{n+1}.\) The function \(u^{n*}\) plays an essential role in the numerical approximation of (17), and its computation triggers the need to use an interpolation method.

Indeed, by virtue of (17), it holds that

$$\begin{aligned} \frac{\partial }{\partial t}\left( u(X(x,t_{n+1};t),t)\right)= & {} \frac{\partial u}{\partial t}(X(x,t_{n+1};t),t)+ \lambda \frac{\partial u}{\partial x}(X(x,t_{n+1};t),t) \nonumber \\= & {} \nu \frac{\partial ^2 u}{\partial x^2}(X(x,t_{n+1};t),t). \end{aligned}$$
(18)

We now discretize (18)—which is equivalent to the first line of (17)—in the variable t by integrating (18) over the interval \([t_n,t_{n+1}]\) and applying the trapezoidal rule to the right-hand side of (18). So, we obtain the formula

$$\begin{aligned} u^{n+1}(x)-u^{n*}(x)=\frac{\nu \Delta t}{2}\left( \frac{d^2 u^{n+1}}{d x^2}(x)+ \frac{d^2 u^{n}}{d x^2}(X(x,t_{n+1};t_{n}))\right) +O(\Delta t^{3}). \end{aligned}$$
(19)

Note that in this case, we have

$$ \frac{d^2 u^n}{d x^2}(X(x,t_{n+1};t_{n}))=\frac{d^{2}}{d x^{2}}\Big (u^{n}(X(x,t_{n+1};t_{n}))\Big )=\frac{d^2 u^{n*}}{dx^2}(x). $$

To compute the finite element solution \(u_{h}^{n+1}\in V_{h}\), we first neglect the error term in (19), multiply this equation by a test function \(v_h\in V_h\), integrate over the domain \(\Omega =(a,b)\), and then perform integration by parts. Thus, \(u_{h}^{n+1}\) verifies

$$\begin{aligned} \int _{a}^{b}u_{h}^{n+1}(x)v_{h}(x)\, dx+\frac{\nu \Delta t}{2}\int _{a}^{b} \frac{du^{n+1}_h}{dx} (x) \frac{dv_{h}}{dx} (x)\, dx\\ =\int _{a}^{b}u_{h}^{n*}(x)v_{h}(x)\, dx-\frac{\nu \Delta t}{2}\int _{a}^{b}\frac{d u^{n*}_h}{d x}(x)\frac{d v_{h}}{d x}(x)\,dx \end{aligned}$$

for all \(v_{h}\in V_{h}\).

Since, in general, \(u_{h}^{n*}\notin V_{h}\), the computation of the terms on the right-hand side of the above formula can be complicated. In the literature, two main methods exist to overcome this drawback: Lagrange-Galerkin and semi-Lagrangian [32,33,34,35]. In the Lagrange-Galerkin method the computation of these integrals is performed using high-order quadrature formulas, whereas in the semi-Lagrangian method, \(u_{h}^{n*}\) is computed by interpolation at the nodes associated with \(V_{h}\), obtaining a function \(I_{h}[u_{h}^{n*}]\in V_{h}\). In either case, an interpolation procedure must be performed to compute \(u_{h}^{n}(X(x,t_{n+1};t_{n}))\) for some \(x\in \Omega \): either at the quadrature nodes in the Lagrange-Galerkin method or at the nodes associated with \(V_{h}\) in the semi-Lagrangian method. This interpolation is one source (but not the only one) of the spurious oscillations appearing in the numerical solution of convection-diffusion problems when the solution presents a steep gradient.

In what follows, we use the semi-Lagrangian scheme, which can be written in the matrix form

$$\begin{aligned} \left( M+\frac{\nu \Delta t}{2}S\right) \varvec{u}^{n+1}=\left( M+\frac{\nu \Delta t}{2}S\right) \varvec{u}^{n*}, \end{aligned}$$
(20)

where M and S are the so-called mass and stiffness matrices, respectively, the entries of which are given by

$$ m_{ij}=\int _{a}^{b}\phi _{i}(x)\phi _{j}(x)\, dx,\quad s_{ij}=\int _{a}^{b}\frac{d\phi _{i}}{dx}(x)\frac{d\phi _{j}}{dx}(x)\, dx,\quad 1\le i,j\le N. $$

Besides, \(\varvec{u}^{n+1}=(u_{1}^{n+1},u_{2}^{n+1},\dots ,u_{N}^{n+1})^{T},\, \varvec{u}^{n*}=(u_{1}^{n*},u_{2}^{n*},\dots ,u_{N}^{n*})^{T}\), and so

$$\begin{aligned} u_{h}^{n+1}(x)&=\sum _{i=1}^{N}u_{i}^{n+1}\phi _{i}(x),\\ I_{h}[u_{h}^{n*}](x)&=\sum _{i=1}^{N}u_{i}^{n*}\phi _{i}(x). \end{aligned}$$

Note that

$$\begin{aligned} u_{i}^{n*}=I_{h}[u_{h}^{n*}](x_{i})=u_{h}^{n*}(x_{i})=u_{h}^{n}(X(x_i,t_{n+1};t_{n})), \end{aligned}$$

so that the computation of \(u_{i}^{n*}\) requires one interpolation step.

This interpolation step is usually performed through Lagrangian interpolation, which provides an accurate numerical solution for smooth solutions, especially when using higher order finite elements (\(2< m\le 10\)), see [33]. Furthermore, it is shown that this method is unconditionally stable, in contrast to the more popular Eulerian schemes, which usually deal explicitly with the convective term but for which some stability constraint arises. In either case, when the analytical solution has a steep derivative, the numerical solution exhibits spurious oscillations that eventually lead to an unstable numerical solution and loss of monotonicity.

In the literature, there are a wide amount of methods for overcoming these difficulties that are applied in Eulerian, Lagrangian, and Eulerian-Lagrangian settings, such as, among others, monotone/quasi-monotone methods [7], subgrid viscosity methods [36, 37], flux limiters [38], SUPG (Stream-Upwind-Petrov-Galerkin) [39], Galerkin/least-squares algorithms [40], or characteristics streamline diffusion method [41, 42]. In this work, we propose to use the interpolation method described in Section 3.2 to reduce the spurious oscillations coming from the interpolation step.

To conclude, we would like to present some numerical experiments showing how the solution to the problem (17) obtained by the finite element method using our interpolation procedure performs compared to some of the methods mentioned above.

Fig. 12
figure 12

Semi-Lagrange finite element scheme using Lagrange interpolation (first row), monotone semi-Lagrangian interpolation (second row), and the interpolation proposed in this work (third row)

Figure 12 shows the exact solution and the numerical results of the problem (17) for \(t=3\) and two different values of \(\nu \). In both cases, we take \(\Omega =\left( 0,5\right) \), \(\lambda =1\), and

$$\begin{aligned} u_{0}\left( x\right) =e^{\displaystyle -\frac{\left( x-\bar{x}_{0}\right) ^{2}}{4\sigma ^{2}}}+\frac{1}{2}\left( H\left( \left( x-x_{1}^{H}\right) \left( x_{2}^{H}-x\right) \right) +1\right) , \end{aligned}$$

where \(\bar{x}_{0}=0.7,\,x_{1}^{H}=1.33,\,x_{2}^{H}=1.76,\,\sigma =0.1\), and H is the Heaviside function. We consider \(\nu =6.7680\times 10^{-5}\) (first column in Fig. 12) and \(\nu =7.8189\times 10^{-4}\) (second column in Fig. 12). With these values of \(\nu \), the maximum in the Gaussian hill is 0.99 and 0.9, respectively, for \(t=3\). Thus, we can gain some insight into the behavior of the numerical solution in two different diffusion regimes: low diffusion, where the solution exhibits steep gradients, and high diffusion, where the solution is flatter.

The numerical solution is computed using a uniform partition \(D_{h}\), with \(h_{i}=x_{i}^{e}-x_{i-1}^{e}=0.2\) for all i, and polynomial degree \(m=8\). This corresponds to the so-called higher order finite element method, which rarely exceeds \(m=10\) in applications, as we mentioned above. The number of elements is \(N_{e}=25\), and the number of nodes associated with the partition is \(N=201\). The time step is \(\Delta t=1/30\), so that the computations require \(N_{T}=90\) time steps. The simulations carried out with our interpolation method use the values \(a_1=5\) and \(\varepsilon =0.1\) (see step 4 in Sect. 3.2). The value of \(\varepsilon \) is significantly greater than the one chosen in the previous examples because the repeated application of the algorithm to the evolution problem (17) demands a more stable numerical scheme, which is achieved by making the performed interpolations have no so steep gradients; see the paragraph immediately after Remark 3.3.

We also consider a monotone semi-Lagrangian method as a representative of the methods developed to overcome spurious oscillations in convection-diffusion problems. This method consists in performing a monotone interpolation using the maximum and minimum values of the numerical solution at the nodes of each element as upper and lower bounds, see [7] for further details. Since this method was developed for finite element spaces with polynomial degree \(m=2\), we retain this polynomial degree in the simulation, but to make a fair comparison, we use a finite element mesh with the same number of nodes as before, so that the number of elements now is \(N_e=100\).

As for the results in Fig. 12, the spurious oscillations obtained using the standard Lagrangian interpolation with low diffusion are worth noting. In contrast, the solution of the monotonic semi-Lagrangian scheme is monotonic, without spurious oscillations, but presents numerical diffusion that decreases the maximum on the Gaussian hill and causes a flatter gradient where the analytical solution exhibits a steep gradient. The solution obtained using the interpolation proposed in this work shows much smaller oscillations and the steep gradient is very well captured. The behavior of the three methods with high diffusion is similar.

6 Conclusions

The numerical solution of a PDE problem through the finite element method usually requires an interpolation scheme with a low number of interpolation points in each element. When the function f to be interpolated has a jump or a steep derivative, the interpolation leads to oscillations produced by the so-called Gibbs phenomenon. This paper is devoted to minimizing such oscillations when the jump’s location is unknown. For this purpose, we perform a Lagrangian interpolation of the transform \(\hat{f}=g\circ f\), where g is a rational transformation chosen to minimize a suitable functional that depends on the values of f. The extension of the procedure to higher dimensions is straightforward. Numerical experiments support the reliability and accuracy of our method in both the univariate and multivariate cases.

Finally, we have verified how our interpolation procedure adapts with advantages to the finite element method in the univariate case. For the multivariate case, it will be applied in future work to a new scheme to solve convection-dominated diffusion PDE problems by using the finite element method.