1 Introduction

1.1 Background

The classical Chebyshev approximation problem is to construct a polynomial of a given degree that has the smallest possible absolute deviation from some continuous function on a given interval. For univariate polynomials of degree \(d\ge 0\) the solution is unique and satisfies an elegant alternation condition: there exist \(d+2\) points of alternating minimal and maximal deviation of the function from approximating polynomial [7] (see Fig. 1).

Fig. 1
figure 1

A typical distribution of the points of minimal and maximal deviation of a continuous function (f, shown in orange) from its best Chebyshev approximation by a polynomial of degree at most 5 (denoted by q, shown in blue) on a bounded interval [ab]

Once we depart from the classical case and consider approximating a continuous function on a compact subset X of \(\mathbb {R}^n\) by multivariate polynomials, the uniqueness is lost: the result of Mairhuber [10] demonstrates that a multivariate Chebyshev approximation problem has a unique solution generically (for all continuous functions on a given compact subset of \(\mathbb {R}^n\)) if and only if the underlying set X is homeomorphic to a closed subset of a circle. In particular, if \(X\subset \mathbb {R}^n\) contains an interior point, then there is no Haar (Chebyshev) space of dimension \(n \ge 2\) for X (i.e. there is no finite system of continuous functions such that every continuous function on X has a unique Chebyshev approximation in the span of this system). An example of such nonunique approximation is shown in Fig. 2.

Fig. 2
figure 2

The function \(f(x,y) = x^6 +y^6+ 3 x^4 y^2 + 3 x^2 y^4 + 6 x y^2 - 2 x^3\) has several best quadratic approximations on the disk \(x^2+y^2\le 1\). The plot of the function in orange colour is shown together with two different best approximations in blue: \(q_0(x,y) = 1\) (on the left) and \(q_1(x,y) = 3 x^2 + 3 y^2-2\) (on the right)

Even though the uniqueness of solutions is lost in the multivariate case, the alternation result holds in the form of algebraic separation. It was first shown in [15] that a polynomial approximation of degree d is optimal if the sets of points of minimal and maximal deviation can not be separated by a polynomial of degree at most d. This result can be reproduced using the standard tools of modern convex analysis, as demonstrated in [19]. Another approach to generalise the notion of alternation to multivariate problems is based on the alternating signs of certain determinants [8].

1.2 Motivation

The classical alternation result was obtained by Chebyshev in 1854 [7], but little is known about the shape of the solutions of a more general multivariate problem. In particular, related work [4] that studies a version of this problem for polynomials with integer coefficients, mentions that the multivariate problem is ‘virtually untouched’. Even though the solutions to the multivariate problem satisfy a form of an alternation condition, the structure of the solutions and the location of points of maximal and minimal deviation are more complex compared to the univariate case, which results in many interesting challenges.

From the point of view of classical approximation theory multivariate polynomial approximation is relatively inefficient: for a range of key applications some other approaches such as the radial basis functions [6] provide superior results. However modern optimisation is increasingly fusing with computational algebraic geometry, successfully tackling problems that were insurmountable in the past, and polynomial approximation emerges in this context as valuable not only for solving computationally challenging problems, but also as an analytic tool that together with Gröbner basis methods may lead to algorithmic solutions for finding extrema in nonconvex problems. Another potential application is a generalisation of trust-region methods, where instead of local quadratic approximations to the function locally more versatile higher order polynomial approximations may be used.

It is also important to mention rational approximation [1]. Rational functions are able to approximate nonsmooth and abruptly changing functions and there are a number of efficient methods for univariate rational approximation [3, 12]. Some of these methods have been extended to multivariate function approximation [2], but the choice is not so extensive. Rational functions are ratios of two polynomials and therefore their advances require a better understanding of polynomial approximation. All these motivate us to study polynomial approximation in details.

Consider the space \(\mathbb {P}_d(\mathbb {R}^n)\) of real polynomials in n variables of degree at most d. Let \(f:X\rightarrow \mathbb {R}\) be a continuous function defined on a compact set \(X\subset \mathbb {R}^n\). A polynomial \(q^*\in \mathbb {P}_d(\mathbb {R}^n)\) solves the multivariate Chebyshev approximation problem for f on X if

$$\begin{aligned} \max _{x\in X}|f(x) - q^*(x)|\le \max _{x\in X} |f(x)-q(x)|\quad \forall q\in \mathbb {P}_d(\mathbb {R}^n). \end{aligned}$$

We are interested in the set \(Q\subset \mathbb {P}_d(\mathbb {R}^n)\) of all such solutions. In some special cases the solution to the multivariate Chebyshev approximation problem is known explicitly. For instance, the best approximation by monomials on a unit cube is obtained from the products of classical Chebyshev polynomials (see [21] and a more recent overview [22]); this is related to another generalisation of Chebyshev’s results, when the problem of a best approximation of zero with polynomials having a fixed highest degree coefficient is considered: in some special cases, solutions on the unit cube are known from [17]; solutions for the unit ball were obtained in [13].

There is a different approach to generalising Chebyshev polynomials, based on extending the relation \(T_k(\cos x) = \cos k x\) to the multivariate case. In [11, 16] more general functions \(h:\mathbb {R}^n\rightarrow \mathbb {R}^n\) periodic with respect to fundamental domains of affine Weyl groups are considered, and the aforementioned relation is replaced by \(P_k(h(x)) = h(kx)\). Such generalised Chebyshev polynomials are in fact systems of polynomials, as \(P_k:\mathbb {R}^n\rightarrow \mathbb {R}^n\). We note here that the aforementioned work, as well as other approximation techniques based on Chebyshev polynomials (common in numerical PDEs), use nodal interpolation with Chebyshev polynomials. This is a conceptually different framework compared to our optimisation setting; in particular, this approach requires a careful choice of interpolation nodes on the domain to ensure the quality of approximation.

1.3 Challenges

For the univariate problem the optimal solutions to the Chebyshev approximation problem can be obtained using numerical techniques that fit in the context of linear programming and the simplex method, and exchange algorithm pioneered by Remez [14] is perhaps the most well-known technique. Even though the multivariate problem can be solved approximately by linear programming, the problem rapidly becomes intractable with the increase in the degree and number of variables, and hence there is much need for more efficient methods. This is another exciting research direction, as the rich structure of the problem is likely to yield specialised methods which surpass the performance of direct linear programming discretisation. The general framework for the potential generalisation of the exchange approach was laid out in [18,19,20]. In these papers, authors (partially) extended de la Vallée-Poussin procedure, which is the core of the Remez method. However, several implementation issues need to be resolved for a practically viable version of the method. It is also possible that some of these issues can not be extended to the case of multivariate polynomials due to the the loss of uniqueness of optimal solution, which is the target of this paper.

For any polynomial q we can define the sets of points of minimal and maximal deviation, i.e. such \(x\in X\) for which the values \(q(x)-f(x)\) and \(f(x)-q(x)\) respectively coincide with the maximum \(\max _{x'\in X}|f(x')-q(x')|\). These sets may be different for different polynomials in the optimal set Q. We show that it is possible to identify an intrinsic pair of such subsets pertaining to all polynomials in Q (see Theorem 3); moreover the location of these points determines the maximal possible dimension of the solution set (see Lemma 1). We also show that for any prescribed arrangement of points of minimal and maximal deviation and any choice of the maximal degree there exists a continuous function and a relevant approximating polynomial for which these points are precisely the points of minimal and maximal deviation; moreover, the set of all best approximations has the largest possible dimension, for any choice of domain X (Lemma 2). Finally, we show that the set of best Chebyshev approximations is always of the maximal possible dimension if the domain X is finite (Lemma 3). All these constructions are essential for designing computational algorithms for multivariate polynomial approximations. In particular, since the basis functions do not form a Chebyshev system in multivariate cases, the proofs of convergence are much more challenging due to the necessity to trace several possibilities.

We begin with some preliminaries and examples in Sect. 2, focussing on the well-known separation characterisation of optimality and Mairhuber’s uniqueness result. In Sect. 3 we present our new results. We then summarise our findings and present some open problems in Sect. 4.

2 Preliminaries and examples

2.1 Multivariate polynomials

A multivariate polynomial of degree d with real coefficients can be represented as

$$\begin{aligned} q(x) = \sum _{|\alpha |\le d} a_\alpha x^\alpha , \end{aligned}$$

where \(\alpha = (\alpha _1,\dots , \alpha _n)\) is an n-tuple of nonnegative integers,

$$\begin{aligned}x^\alpha = x_1^{\alpha _1} x_2^{\alpha _2}\cdots x_n^{\alpha _n}, \end{aligned}$$

\(|\alpha | = |\alpha _1|+|\alpha _2|+\cdots + |\alpha _n|\), and \(a_\alpha \in \mathbb {R}\) are the coefficients. All polynomials of degree not exceeding d constitute a vector space \(\mathbb {P}_d(\mathbb {R}^n)= \mathop {\textrm{span}}\limits \{x^\alpha \,|\, |\alpha |\le d\}\) of dimension \(\left( {\begin{array}{c}n+d\\ d\end{array}}\right)\).

Note that, generally speaking, we can consider any finite set of (linearly independent) polynomials in n variables, \(G = (g_1,\dots , g_N)\) and instead of the space \(\mathbb {P}_d(\mathbb {R}^n)\) consider the linear span V of G, i.e.

$$\begin{aligned} V = \mathop {\textrm{span}}\limits \{g_i\,|\, i \in \{1,\dots , N\}\}. \end{aligned}$$
(1)

Then the solution set \(Q\subseteq V\) to the Chebyshev approximation problem for a given continuous function f defined on a compact set \(X\subseteq \mathbb {R}^n\) is

$$\begin{aligned} Q{:}{=} \mathop {\mathrm {Arg\,min}}\limits _{q\in V}\Vert f-q\Vert _\infty , \end{aligned}$$
(2)

where

$$\begin{aligned} \Vert f-q\Vert _{\infty } = \max _{x\in X}|f(x) - q(x)|. \end{aligned}$$

Fixing a continuous function \(f:X\rightarrow \mathbb {R}\), for every polynomial \(q\in V\) we define the sets of points of minimal and maximal deviation explicitly as

$$\begin{aligned} \mathcal {N}(q)&{:}{=} \{x\in X\,|\, q(x)-f(x) = \Vert f-q\Vert _\infty \},\nonumber \\ \mathcal {P}(q)&{:}{=} \{x\in X\,|\, f(x)-q(x) = \Vert f-q\Vert _\infty \}. \end{aligned}$$
(3)

Observe that for any given polynomial q at least one of these sets is nonempty, and for any \(q^*\in Q\) both of them are nonempty (otherwise one can add an appropriate small constant to \(q^*\) and decrease the value of the maximal absolute deviation). Also observe that the sets \(\mathcal {N}(q)\) and \(\mathcal {P}(q)\) are disjoint unless \(q\equiv f\) on X (in this case \(\mathcal {N}(q) = \mathcal {P}(q) = X\)).

The minimisation problem of (2) is an unconstrained convex optimisation problem: the objective function \(\Vert f-q\Vert _\infty\) can be interpreted as the maximum over two families of linear functions parametrised by the domain variable \(x\in X\), i.e.

$$\begin{aligned} \Vert f-q\Vert _\infty = \max _{x\in X}|f(x)-q(x)|= \max _{\begin{array}{c} x\in X\\ \sigma \in \{-1,1\} \end{array}}\sigma (f(x)-q(x)). \end{aligned}$$
(4)

The solution set Q is nonempty, since it represents the metric projection of f onto a finite-dimensional linear subspace V of the normed linear space of functions bounded on X. It is also easy to see from the continuity of f that this set is closed. Moreover, since a maximum function over a family of linear functions is convex, Q is convex (e.g. see [9, Proposition 2.1.2]).

Example 1

(Solution set is unbounded) We consider a degenerate case of the problem: find the best linear approximation to \(f(x,y) = x^2\) on \(X=[-1,1]\times \{0\}\). Since the domain is effectively restricted to the line segment \([-1,1]\), the solution reduces to the classical univariate case: there is a unique best approximation, which happens to be constant, \(\frac{1}{2}\). Observe however that in the true two-dimensional setting any linear polynomial of the form \(q(x,y) = \frac{1}{2}+\alpha y\) is also a best approximation of f on X. This means that the solution set of best approximations is unbounded, \(Q = \{\frac{1}{2}+ \alpha y, \, \alpha \in \mathbb {R}\}\), even though all such optimal solutions coincide on X, and effectively—on the set X—provide the same unique best approximation.

2.2 Optimality conditions

Definition 1

We say that a polynomial \(p\in V\) separates two sets \(N,P\subset \mathbb {R}^n\) if

$$\begin{aligned} p(x)\cdot p(y) \le 0 \quad \forall x\in N, y\in P; \end{aligned}$$
(5)

we say that the separation is strict if the inequality in (5) is strict, i.e.

$$\begin{aligned} p(x)\cdot p(y) < 0 \quad \forall x\in N, y\in P. \end{aligned}$$
(6)

Recall the well-known characterisations of optimality (see [15] for the original result and [19] for modern proofs). These important results are highlighted in Sect. 1 as our preliminary background of the problem.

Theorem 1

Let X be a compact subset of \(\mathbb {R}^n\), and assume that \(f: X\rightarrow \mathbb {R}\) is a continuous function. A polynomial \(q\in V\) is an optimal solution to the Chebyshev approximation problem (2) if and only if there exists no \(p\in V\) that strictly separates the sets \(\mathcal {N}(q)\) and \(\mathcal {P}(q)\).

Example 2

(Best quadratic approximation is not unique) We focus on the function \(f(x,y) = x^6 +y^6+ 3 x^4 y^2 + 3 x^2 y^4 + 6 x y^2 - 2 x^3\) discussed in the Introduction and demonstrate that it does indeed have multiple best quadratic approximations on the disk \(x^2+y^2\le 1\) (see Fig. 2).

For two different polynomials \(q_0(x,y) = 1\) and \(q_1(x,y) = 3 x^2 + 3 y^2-2\) the points of maximal negative and positive deviation of f from these polynomials are

$$\begin{aligned} \mathcal {N}(q_0) = \{z_1,z_3,z_5\}, \quad \mathcal {N}(q_1) = \mathcal {N}(q_0)\cup \{z_0\}, \quad \mathcal {P}(q_0) = \mathcal {P}(q_1) = \{z_2,z_4,z_6\}, \end{aligned}$$

where

$$\begin{aligned} z_0 = (0,0),\; z_1 = (1,0),\; z_2 = \left( \frac{1}{2},\frac{\sqrt{3}}{2}\right) ,\; z_3 = \left( -\frac{1}{2},\frac{\sqrt{3}}{2}\right) , \end{aligned}$$
$$\begin{aligned} z_4 = (-1,0), \; z_5 = \left( -\frac{1}{2},-\frac{\sqrt{3}}{2}\right) ,\; z_6 = \left( \frac{1}{2},-\frac{\sqrt{3}}{2}\right) . \end{aligned}$$
(7)

This is not difficult to verify using standard calculus techniques (see appendix).

2.3 Location of maximal and minimal deviation points

Observe that the points \(z_1\), \(z_2,\dots , z_6\) lie on the unit circle. By the Bézout theorem, this circle can have at most 4 intersections with any other quadratic curve. However if we could find a quadratic polynomial that strictly separates the points of maximal and minimal deviation, the relevant curve would intersect the circle in at least six points, as shown in Fig. 3.

Fig. 3
figure 3

On the left: the intersection of two quadratic curves at six points contradicts the Bézout theorem; on the right: a subset of the unit disk homeomorphic to a circle

Hence such separation is impossible, so both \(q_0\) and \(q_1\) are optimal.

We conclude this section with the well-known result of Mairhuber [10] (generalised to compact Hausdorff spaces by Brown [5]). These results contributed to the motivation for this paper, since the uniqueness of the optimal solution is lost (Sect. 1).

Theorem 2

(Mairhuber) A compact subset X of \(\mathbb {R}^n\) containing at least \(k \ge 2\) points may serve as the domain of definition of a set of real continuous functions \(f_1(x),\dots ,f_k(x)\) that provide a unique Chebyshev approximation to any continuous function f on the set X, if and only if X is homeomorphic to a closed subset of the circumference of a circle.

With relation to our setting, Mairhuber’s result is a necessary condition for uniqueness, since our choice of the system of functions is restricted to multivariate polynomials. Hence it is possible to identify a compact set X homeomorphic to a circle and a set of polynomials linearly independent on X that do not provide a unique multivariate approximation to a continuous function on X.

Example 3

Observe that any best approximation to f from Example 2 on the disk is also the best approximation to f on any subset of the disk that contains the sets \(\mathcal {N}(q_0)\) and \(\mathcal {P}(q_0)\). Even though the two different best approximations \(q_0\) and \(q_1\) coincide on the boundary of the disk, they take different values everywhere in the interior, and hence we can choose another subset of the unit disk that is homeomorphic to a circle (like the one shown in Fig. 3 on the right) to obtain two different optimal solutions. This does not contradict Mairhuber’s theorem, since in this case we have restricted ourselves to a very specific choice of the basic functions.

3 Structure of the solution set

3.1 The location of maximal and minimal deviation points for different optimal solutions

The key technical result of this section is the following theorem that establishes the existence of uniquely defined subsets of points of maximal and minimal deviation across all optimal solutions. This means that the points of maximal and minimal deviation do not wander around the domain X as we move from one optimal solution to another.

Theorem 3

Let \(f:X\rightarrow \mathbb {R}\) be a continuous function defined on a compact set \(X\subset \mathbb {R}^n\), let V be a subspace of multivariate polynomials in n variables (1), and suppose that Q is the set of optimal solutions to the relevant optimisation problem, as in (2). Then

  1. (i)

    \(\mathcal {N}(q) = \mathcal {N}(p)\), \(\mathcal {P}(q) = \mathcal {P}(p)\) \(\forall p,q\in \mathop {\textrm{ri}}\limits Q\);

  2. (ii)

    \(\mathcal {N}(q) \subseteq \mathcal {N}(p)\), \(\mathcal {P}(q) \subseteq \mathcal {P}(p)\) \(\forall q\in \mathop {\textrm{ri}}\limits Q, p \in Q\).

Here the relative interior is considered with respect to the convex sets of the coefficients in the representation of the solutions as linear combinations of polynomials in V.

For the proof of this theorem, we will need the following elementary result about max-type convex functions. In particular, we prove the following proposition.

Proposition 1

Let \(F:\mathbb {R}^n\rightarrow \mathbb {R}\) be a pointwise maximum over a family of linear functions,

$$\begin{aligned} F(v) = \max _{t\in T} F_t(v), \quad F_t:\mathbb {R}^n\rightarrow \mathbb {R}\text { linear } \forall t\in T. \end{aligned}$$

Let \(I(v) = \{t\,|\, F_t(v) = F(v)\}\), \(Q{:}{=} \mathop {\mathrm {Arg\,min}}\limits \limits _{v\in \mathbb {R}^n}F(v)\). If \(Q\ne \emptyset\), then

$$\begin{aligned} I(v) \subseteq I(u)\quad \forall v\in \mathop {\textrm{ri}}\limits Q, u\in Q. \end{aligned}$$

Proof

Let \(v\in \mathop {\textrm{ri}}\limits Q\), \(u\in Q\). Assume that there exists \(t\in T\) such that \(t\in I(v)\setminus I(u)\). Then \(F(v) = F(u) = F_t(v)>F_t(u)\), and since \(F_t\) is linear, we then have

$$\begin{aligned} F(v-\alpha (u-v) ) \ge F_t(v-\alpha (u-v)) = F_t(v) - \alpha (F_t(u)-F_t(v))> F(v) \quad \forall \alpha >0, \end{aligned}$$

hence, \(v-\alpha (u-v)\notin Q\) for \(\alpha >0\), while \(u=v+(u-v)\in Q\), which means \(v\notin \mathop {\textrm{ri}}\limits Q\). \(\square\)

Proof of Theorem 3

Recall that our objective function can be represented as the maximum over a family of linear functions, as in (4). For every polynomial \(q\in V\) define the set of active indices

$$\begin{aligned} I(q) = \{(x,\sigma )\in X\times \{-1,1\}\,|\, \sigma (f(x)-q(x)) = \Vert f-q\Vert _\infty \}. \end{aligned}$$

It is evident from the definition (3) of \(\mathcal {N}(q)\) and \(\mathcal {P}(q)\) that

$$\begin{aligned} x\in \mathcal {N}(q) \; \Leftrightarrow (x,-1) \in I(q); \quad x\in \mathcal {P}(q) \; \Leftrightarrow (x,1) \in I(q). \end{aligned}$$

The result now follows from Proposition 1. \(\square\)

The following corollary of Theorem 3 characterises the structure of the location of maximal deviation points corresponding to different optimal solutions.

Corollary 1

The sets of points of minimal and maximal deviation remain constant if the optimal solutions belong to the relative interior of the solution set. Additional maximal and minimal deviation points can only occur if an optimal solution is on the relative boundary.

For any given continuous function f defined on a compact set X we can hence define the minimal or essential sets of points of minimal and maximal deviation,

$$\begin{aligned} \mathcal {P}= \mathcal {P}(q), \; \mathcal {N}= \mathcal {N}(q), \; q\in \mathop {\textrm{ri}}\limits Q, \end{aligned}$$

where \(\mathcal {P}(q)\) and \(\mathcal {N}(q)\) are defined in the standard way, as in (3). For instance, in Example 2 we have \(\mathcal {N}= \{z_1,z_3,z_5\}\) and \(\mathcal {P}= \{z_2,z_4,z_6\}\), while \(\mathcal {N}(q_1)\) contains an additional point \(z_0\).

3.2 Dimension of the solution set

We next focus on the relation between the family of separating polynomials and the dimension of solution set.

For a fixed continuous function \(f:X\rightarrow \mathbb {R}\) and a polynomial \(q\in V\) consider the set of all polynomials in V that separate the points of minimal and maximal deviation,

$$\begin{aligned} S(q) = \{s\in V\,|\, s(x)\cdot s(y)\le 0 \, \forall x\in \mathcal {P}(q),y\in \mathcal {N}(q)\}. \end{aligned}$$

Notice that the zero polynomial is always in S(q), and for the polynomials in the optimal solution set we may have a nontrivial set of separating functions. This happens in particular when all points of minimal and maximal deviation are located on an algebraic variety of a subset of V.

Since the pair of sets of minimal and maximal deviation is minimal on the relative interior of Q, and such minimal pair is unique according to Theorem 3, we can define the maximal set of separating polynomials as \(S = S(q)\) for \(q\in \mathop {\textrm{ri}}\limits Q\).

For the rest of the section, we work with an arbitrary fixed continuous real-valued function f defined on a compact set \(X\subset \mathbb {R}^n\), so we do not repeat this assumption in each statement, and simply refer to the solution set Q of the corresponding Chebyshev approximation problem.

Lemma 1

For the solution set Q we have \(\dim Q\le \dim S\); moreover, for any \(q\in \mathop {\textrm{ri}}\limits Q\), \(p\in Q\) we have \(p-q\in S(p)\subseteq S\).

Proof

Observe that it is enough to show that for any \(q\in \mathop {\textrm{ri}}\limits Q\) and any \(p\in Q\) we have \(p-q\in S\). It then follows that \(\mathop {\textrm{aff}}\limits Q \subseteq S+q\), and hence \(\dim Q \le \dim S\).

Let \(q\in \mathop {\textrm{ri}}\limits Q\) and assume \(p\in Q\). Then \(\Vert f-q\Vert _\infty =\Vert f-p\Vert _\infty\). By Theorem 3 we have \(\mathcal {N}(q) \subseteq \mathcal {N}(p)\), \(\mathcal {P}(q)\subseteq \mathcal {P}(p)\), therefore:

  • if \(u\in \mathcal {N}(q)\), then \(q(u)-f(u) = p(u)-f(u)=\Vert f-p\Vert _\infty\);

  • if \(u\in \mathcal {N}(p)\setminus \mathcal {N}(q)\), then \(q(u)-f(u)<\Vert f-p\Vert _\infty = p(u)-f(u)\).

And a similar relation, with inverse inequalities apply for \(u\in P(p)\). Therefore:

$$\begin{aligned} f(u) - p(u)&\le f(u) -q(u)\quad \forall u \in \mathcal {N}(p), \\ f(u) - p(u)&\ge f(u) -q(u)\quad \forall u \in \mathcal {P}(p). \end{aligned}$$

Let \(s(x) = p(x)-q(x)\). We have

$$\begin{aligned} s(u) = p(u) - q(u) \ge 0 \quad \forall u\in \mathcal {N}(p), \qquad s(u) = p(u)- q(u) \le 0 \quad \forall u\in \mathcal {P}(p), \end{aligned}$$

and so \(s(u) \in S(p)\subseteq S(q)\). \(\square\)

Corollary 2

If for the solution set Q we have \(\dim Q >0\), then all essential points of minimal and maximal deviation lie on an algebraic variety of some nontrivial polynomial \(s\in V\).

Proof

This follows directly from a modification of the proof of Lemma 1: if Q is of dimension 1 or higher, then there exist two different polynomials \(q\in \mathop {\textrm{ri}}\limits Q\) and \(p\in Q\). We have

$$\begin{aligned} f(u) - p(u)&= f(u) -q(u)\quad \forall u \in \mathcal {N}= \mathcal {N}(q), \\ f(u) - p(u)&= f(u) -q(u)\quad \forall u \in \mathcal {P}= \mathcal {P}(q). \end{aligned}$$

Hence, \(s(u) = p(u)-q(u) = 0 \quad \forall u \in \mathcal {N}\cup \mathcal {P}\). \(\square\)

The next corollary is a well-known uniqueness result.

Corollary 3

If the set S is trivial, then the optimal solution is unique.

Proof

If \(S=\{0\}\), then \(\dim S = 0\), and by Lemma 1 we have \(\dim Q = 0\). \(\square\)

3.3 Uniqueness and the location of maximal deviation points

It may happen that the dimensions of Q and S do not coincide. Consider the following example.

Example 4

Let \(f(x,y) = (x^2 - \frac{1}{2})(1 - y^2)\) and consider the problem of finding a best linear approximation of this function on the square \(X = [-1,1]\times [-1,1]\).

It is not difficult to verify that the constant function \(q_0(x,y)\equiv 0\) is an optimal solution: the points of maximal deviation are the maxima of f(xy) on the square, attained at \(\mathcal {P}(q_0) = \{(1,0),(-1,0)\}\); the set of points of minimal deviation is a singleton \(\mathcal {N}(q_0) = \{(0,0)\}\) (we provide technical details in the appendix).

Since these three alternating points of maximal and minimal deviation lie on a straight line \(y=0\), there is no strict linear separator between them (see the left image in Fig. 5), hence this constant solution must be optimal by Theorem 1. Also notice that taking any point out of either \(\mathcal {N}(q_0)\) or \(\mathcal {P}(q_0)\) ruins the optimality condition (in fact, our configuration of the points of minimal and maximal deviation is also known as critical set in the notation of [15]). Hence we must have \(\mathcal {N}= \mathcal {N}(q_0)\) and \(\mathcal {P}= \mathcal {P}(q_0)\), so these are the essential sets of the points of minimal and maximal deviation. These three points can be separated non-strictly by the linear functions of the form \(l(x,y)= \alpha y\), \(\alpha \in \mathbb {R}\). We therefore have

$$\begin{aligned} S = \{\alpha y \,|\, \alpha \in \mathbb {R}\}. \end{aligned}$$

Even though \(\dim S = 1\), the best linear approximation is unique. It follows from Lemma 1 that \(Q\subseteq S\), and hence any best linear approximation should have the form \(q_\alpha (x,y) = \alpha y\) for some \(\alpha \in \mathbb {R}\). When \(x = \pm 1\), we have the deviation \(d_\alpha (x,y) = f(x,y) - q_\alpha (x,y) = \frac{1-y^2}{2}- \alpha y\). The maximun of \(d_\alpha (x,y)\) is attained at \(y=-\alpha\), with the value \(d_\alpha (\pm 1, -\alpha ) = \frac{1}{2}+\frac{\alpha ^2}{2}>\frac{1}{2}\) for \(\alpha \ne 0\), which means that there are no optimal solutions in the neighbourhood of \(q_0(x,y)\equiv 0\), and hence, due to the convexity of Q, the best approximation is unique.

Now consider a modified example: let \(h(x,y) = (x^2 - \frac{1}{2})(1 - |y|)\) (see Fig. 6, left hand side). The same trivial constant function \(q_0(x,y)\equiv 0\) is a best linear approximation to h, with the same sets of points of minimal and maximal deviation (see Fig. 4, right). However, this best approximation is not unique: any function \(q_\alpha (x,y)=\alpha y\) for \(\alpha \in \left[ -\frac{1}{2},\frac{1}{2}\right]\) is also a best linear approximation of f on the square X (see appendix for technical computations). Moreover, the sets of points of maximal and minimal deviation are different at the endpoints of the optimal interval, i.e. for \(\alpha = \pm \frac{1}{2}\), see Fig. 5 (the technical computations are presented in appendix).

Fig. 4
figure 4

The function \(f(x,y) = (x^2 - \frac{1}{2})(1 - y^2)\) (on the left), the absolute deviation of f from the constant \(q_0(x,y) \equiv 0\), \(|d_0(x,y)| = |f(x,y)-q_0(x,y)|\) (middle), and the function \(g(x,y)= (\min \{ |2x|, 2-|2x|\} - 1/2)(1-y^2),\) (on the right)

Fig. 5
figure 5

The points of minimal and maximal deviation for different cases: on the left for h (and also f) approximated by \(q_0\); in the middle for h and \(q_{\frac{1}{2}}(x,y) = \frac{y}{2}\); on the right for h and \(q_{-\frac{1}{2}}(x,y) = -\frac{y}{2}\)

Fig. 6
figure 6

The function \(h(x,y) = (x^2 - \frac{1}{2})(1 - |y|)\) on the left, and the same function shown together with two different best approximations: \(q_{\frac{1}{2}}(x,y) = \frac{y}{2}\) and \(q_{-\frac{1}{2}}(x,y) = -\frac{y}{2}\)

3.4 Uniqueness and smoothness

Finally, we would like to point out that smoothness of the function that we are approximating is not necessary for the uniqueness of a best approximation, as one may be tempted to conclude from the study of the functions f and h. Note that for yet another modification,

$$\begin{aligned} g(x,y) {:}{=} (\min \{ |2x|, 2-|2x|\} - 1/2)(1-y^2), \end{aligned}$$

the function \(q_0(x,y) \equiv 0\) is a unique best approximation, while the points of maximal and minimal deviation are distributed in a similar fashion, along the line \(y=0\), potentially allowing for nonuniqueness. Notice that the function g(xy) is nondifferentiable at the points of minimal and maximal deviation. This function is however smooth in y for every fixed x. This observation is related to the problem of relating the specific (partial) smoothness properties of the function we are approximating with the solution set. We discuss this open question in some detail in the conclusions section.

We have seen from the preceding example that whether the Chebyshev approximation problem has a solution is determined not only by the location of points of maximal and minimal deviation, but also by the properties of the function that is being approximated; in particular the smoothness of the function at the points of minimal and maximal deviation appears to be a decisive factor.

Example 5

For the distribution of points of maximal and minimal deviation from Example 2, i.e. \(N = \{z_1,z_3,z_5\}\), \(P = \{z_2,z_4,z_6\}\), where \(z_1,z_2,\dots z_6\) are defined by (7), we construct a nonsmooth continuous function

$$\begin{aligned} f(x) = f_1(x)-f_2(x), \end{aligned}$$

where

$$\begin{aligned} f_1(x)= & {} \min \{ 2 \Vert x-z_1\Vert , 2 \Vert x-z_3\Vert , 2 \Vert x-z_5\Vert , 1\},\\ f_2(x)= & {} \min \{ 2 \Vert x-z_2\Vert , 2 \Vert x-z_4\Vert , 2 \Vert x-z_6\Vert , 1\}, \end{aligned}$$

shown in Fig. 7 on the left.

The function \(g(x,y) = 0\) is an optimal solution to the quadratic approximation problem for the function f on \(X = \{x\,|\, \Vert x\Vert \le 2\}\) (since this is exactly the same pattern of points of minimal and maximal deviation as discussed in one of the two cases in Example 2). Moreover, the polynomial

$$\begin{aligned} q_\alpha (x,y) = \alpha (x^2+y^2-1) \end{aligned}$$

is also a best approximation of f for sufficiently small values of \(\alpha\) (this may be already evident to the reader from the plot; the mathematically rigorous reasons for this will be laid out in the proof of Lemma 2).

Modifying the ‘bump’ that defines each of the peaks that correspond to the points of minimal and maximal deviation so that the function f is smooth around these points, results in the uniqueness of the approximation \(q_0\). Indeed, let

$$\begin{aligned} h(x) = h_1(x)-h_2(x), \end{aligned}$$

where

$$\begin{aligned} h_1(x)=\min \{ 4 \Vert x-z_1\Vert ^2, 4 \Vert x-z_3\Vert ^2, 4 \Vert x-z_5\Vert ^2, 1\},\\h_2(x)=\min \{ 4 \Vert x-z_2\Vert ^2, 4 \Vert x-z_4\Vert ^2, 4 \Vert x-z_6\Vert ^2, 1\}, \end{aligned}$$

this function is shown in Fig. 7 on the right.

The same constant polynomial \(q_0(x,y) = 0\) is optimal for h, however, this time the solution is unique: indeed, suppose that another polynomial in S provides a best approximation. This polynomial must be of the form \(p_\alpha (x,y) = \alpha (x^2+y^2 -1)\) for some \(\alpha \ne 0\). By convexity of the solution set, \(p_{\alpha '}\) should also be optimal for any \(\alpha '\) between 0 and \(\alpha\).

In the neighbourhood of the point \(z_1\) we have \(h(x,y) = 4 \Vert x-z_1\Vert ^2-1 = 4\left[ (x-1)^2 + y^2\right] -1\). Then for a sufficiently small \(|\alpha '|\)

$$\begin{aligned} h\left( \frac{4}{4-\alpha '},0\right) -p_{\alpha '}\left( \frac{4}{4-\alpha '},0\right) = -1-\frac{(\alpha ')^2}{4-\alpha '}<-1, \end{aligned}$$

hence this is not a solution.

Fig. 7
figure 7

The functions f and h in Example 5

The next result provides a more general justification for the non-uniqueness of the approximation to a nonsmooth function f that we have just considered.

Lemma 2

Let V be as in (1), and let N and P be two disjoint compact subsets of \(\mathbb {R}^n\) such that they can not be separated strictly by a polynomial in V. Let

$$\begin{aligned} S = \{s\in V\,|\, s(x)\cdot s(y)\le 0 \, \forall x\in P,y\in N\}. \end{aligned}$$

There exists a continuous function \(f:\mathbb {R}^n\rightarrow \mathbb {R}\) such that for any compact \(X\in \mathbb {R}^n\) with \(N,P\subseteq X\), the optimal solution set Q to the relevant optimisation problem satisfies \(\dim Q = \dim S\). Moreover, there exists \(q_0\in Q\) such that \(\mathcal {P}(q_0) = P\), \(\mathcal {N}(q_0) = N\).

Proof

Let

$$\begin{aligned} f(x){:}{=}\max _{u\in P}\varphi _u(x)-\max _{v\in N}\varphi _v(x), \end{aligned}$$

where

$$\begin{aligned} \varphi _u(x) = \max \left\{ 1-\frac{2}{d}\Vert x-u\Vert ,0\right\} , \quad d= \min _{\begin{array}{c} u\in P\\ v\in N \end{array}}\Vert u-v\Vert . \end{aligned}$$

Fix a compact set \(X\subset \mathbb {R}^n\) such that \(P\cup N \subseteq X\). First observe that \(q_0(x,y)\equiv 0\) is an optimal solution to the Chebyshev approximation problem: the deviation \(f-q_0\) coincides with the function f, and we have for all \(x\in X\)

$$\begin{aligned} f(x)&= \max _{u\in P}\varphi _u(x)-\max _{v\in N}\varphi _v(x) \nonumber \\&\le \max _{u\in P}\varphi _u(x)\nonumber \\&= \max _{u\in P}\max \left\{ 1-\frac{2}{d}\Vert x-u\Vert ,0\right\} \nonumber \\&\le \max _{u\in P}\max \left\{ 1-\frac{2}{d}\min _{u\in P}\Vert x-u\Vert ,0\right\} \nonumber \\&= \max \left\{ 1-\frac{2}{d}\min _{u\in P}\Vert x-u\Vert ,0\right\} \nonumber \\&= 1 - \frac{2}{d} \min \left\{ \min _{u\in P}\Vert x-u\Vert ,\frac{d}{2}\right\} \le 1; \end{aligned}$$
(8)

likewise

$$\begin{aligned} f(x)&\ge -1 + \frac{2}{d} \min \left\{ \min _{v\in N}\Vert x-v\Vert ,\frac{d}{2}\right\} \ge -1\quad \forall x\in X. \end{aligned}$$
(9)

Moreover, for \(x\in P\) we have \(f(x) =1\), for \(x\in N\) we have \(f(x) = -1\), and it follows from (8) and (9) that for \(x\notin P\cap U\) we have \(-1<f(x)<1\), hence, \(N= \mathcal {N}(q_0)\) and \(P= \mathcal {P}(q_0)\), so \(q_0\) satisfies the very last statement of the lemma. We have assumed that N and P can not be strictly separated by a polynomial in V, hence we deduce that \(q_0\equiv 0\) is a best Chebyshev approximation of f on X.

We will next show that for any direction \(p\in S\) such that \(p(N)\le 0\) and \(p(P)\ge 0\) there exists a sufficiently small \(\alpha >0\) such that \(\alpha p\) is another best Chebyshev approximation of f on X. Note that this guarantees that for any set of linearly independent vectors in S we can produce a simplex with vertices at zero and at nonzero vectors along these linearly independent vectors. This yields \(\dim Q = \dim S\).

Since \(p\in S\) is a polynomial, and the set X is compact, p is Lipschitz on X with some constant L, and its absolute value is bounded by some \(M> 0\) on X. Let \(\alpha {:}{=} \min \left\{ \frac{1}{\,}M,\frac{2}{d L}\right\}\), then for \(q = \alpha p\) we have

$$\begin{aligned}{} & {} |q(x)| = |\alpha p(x)|\le \alpha \cdot M \le 1 \; \forall x\in X, \\{} & {} |q(x) -q(y)| = |\alpha p(x) - \alpha p(y)| = \alpha | p(x) - p(y)|\le \alpha L \Vert x-y\Vert \le \frac{2}{d}\Vert x-y\Vert \quad \forall x,y \in X; \\{} & {} q(y)- \frac{2}{d}\Vert x-y\Vert \le q(x) \le q(y)+\frac{2}{d}\Vert x-y\Vert \quad \forall x,y \in X; \end{aligned}$$

From \(q(N)\le 0\) and \(p(P)\ge 0\) we have for all \(x\in X\)

$$\begin{aligned} \frac{2}{d}\min _{y\in P} \Vert x-y\Vert \le \max _{y\in P} \left(q(y)- \frac{2}{d}\Vert x-y\Vert \right) \le q(x) \le \min _{y\in N} \left(q(y)+\frac{2}{d}\Vert x-y\Vert \right)\le \frac{2}{d}\min _{y\in N} \Vert x-y\Vert . \end{aligned}$$

Hence,

$$\begin{aligned} \max \left\{ \frac{2}{d}\min _{y\in P} \Vert x-y\Vert ,-1\right\} \le q(x) \le \min \left\{ \frac{2}{d}\min _{y\in N} \Vert x-y\Vert ,1\right\} . \end{aligned}$$

We hence have for every \(x\in X\)

$$\begin{aligned} - 1 \le f(x)-q(x)\le 1, \end{aligned}$$

therefore q is a best Chebyshev approximation of f on X. \(\square\)

3.5 Uniqueness and the domain geometry

Finally, we turn our attention to the relation between the uniqueness of best Chebyshev approximation and the geometry of the domain. We show that on finite domains the best approximation is nonunique whenever the dimension of S allows for this (that is, \(\dim S>0\) ).

Lemma 3

If \(X\subset \mathbb {R}^n\) is finite, then for any \(f:X\rightarrow \mathbb {R}\) we have \(\dim Q = \dim S\).

Proof

If \(\dim S=0\), the result follows directly from Corollary 3. For the rest of the proof, assume \(\dim S>0\).

Let \(q\in \mathop {\textrm{ri}}\limits Q\), \(s\in S\). Then

$$\begin{aligned} s(x)\cdot s(y)\le 0 \, \forall x\in \mathcal {P},y\in \mathcal {N}. \end{aligned}$$

Let

$$\begin{aligned} q_t{:}{=} q+ t s. \end{aligned}$$

Without loss of generality, assume that \(s(x)\ge 0\) for \(x\in \mathcal {P}\) and \(s(x)\le 0\) for \(x\in \mathcal {N}\) (otherwise consider \(-s\)).

Let

$$\begin{aligned} \alpha {:}{=} \Vert f-q\Vert _{\infty }-\max _{x\in X\setminus (\mathcal {N}\cup \mathcal {P})} |f(x)-q(x)|, \end{aligned}$$

where we use the standard convention that the maximum over an empty set equals \(-\infty\), so \(\alpha =+\infty\) in the case when \(X=\mathcal {N}\cup \mathcal {P}\). Since X is finite, \(\alpha >0\).

Let

$$\begin{aligned} \beta {:}{=} \max _{x\in X}|s(x)|. \end{aligned}$$

We have for all \(t\ge 0\) and \(x\in \mathcal {N}\)

$$\begin{aligned} \Vert f-q\Vert _\infty = q(x) - f(x) \ge q(x)-f(x)+ts(x)\ge q(x)-f(x) - t \beta = \Vert f-q\Vert _\infty - t \beta ; \end{aligned}$$

for \(t\ge 0\) and \(x\in \mathcal {P}\)

$$\begin{aligned} \Vert f-q\Vert _\infty = f(x) - q(x) \ge f(x)-q(x)-ts(x)\ge f(x)-q(x) - t \beta = \Vert f-q\Vert _\infty - t \beta ; \end{aligned}$$

So, whenever \(t\beta \le \Vert f-q\Vert _\infty\), we find that for \(x\in \mathcal {N}\cup \mathcal {P}\),

$$\begin{aligned} |f(x)-q_t(x)|\le \Vert f-q\Vert _\infty . \end{aligned}$$

For \(x\in X\setminus (\mathcal {N}{ \cup } \mathcal {P})\) and all \(t\ge 0\)

$$\begin{aligned} |f(x)-q_t(x)| \le | f(x) -q(x)| + t|s(x)|\le \Vert f-q\Vert _\infty -\alpha + t\beta . \end{aligned}$$

Note that \(\alpha =+\infty\) only for the case when \(X=\mathcal {N}{\cup } \mathcal {P}\).

Therefore, for t such that \(t \beta \le \min \{\alpha ,\Vert f-q\Vert _\infty \}\) we have

$$\begin{aligned} \Vert q_t -f\Vert _\infty \le \Vert f-q\Vert _\infty , \end{aligned}$$

and hence \(q_t\in Q\) for some positive t.

It remains to pick a maximal linearly independent system \(\{s_1,s_2,\dots , s_d\}\subset S\), and observe that the convex hull \(\mathop {\textrm{co}}\limits \{q,q+t_1 s_1,\dots , q+ t_d s_d\}\subseteq Q\) for some nonzero \(t_1,\dots , t_d\). Therefore, \(\dim Q \ge \dim S\). By Lemma 1 the converse is true, and we are done. \(\square\)

It follows from the previous lemma that the uniqueness of solutions depends not only on the function itself, but also on the domain of its definition. In particular, it may happen that a function defined on a continuous domain has a unique best approximation, but a discretisation of this domain would lead to nonuniqueness of best approximation. This observation is crucial, since most numerical methods work on finite grids rather than with continuous functions directly. Therefore, they do require a certain level of discretisation. In this case there is a potential danger of finding an optimal solution to the discretised problem, while it is not relevant to the original one.

4 Conclusions

We have identified and discussed in detail key structural properties pertaining to the solution set of the multivariate Chebyshev approximation problem. We have clarified the relations between the points of maximal and minimal deviation for different optimal solutions, related the set of optimal solutions to the set of separating polynomials, and elucidated the relations between the geometry of the domain and smoothness of the function and uniqueness of the solutions.

However many questions remain unanswered, some of them pertinent to the potential algorithmic solutions, and more remains to be done to fully understand the relation between the uniqueness of the solutions and structure of the problem. Namely, the following questions are of paramount importance.

  1. 1.

    Can we refine Mairhuber’s theorem for the case of multivariate Chebyshev approximation by polynomials of degree at most d? Example 2 indicates that to have a unique approximation of any continuous function on a given domain by a system of multivariate polynomials, it may not be enough to restrict the domain to a set homeomorphic to a subset of a circle. Perhaps a more algebraic condition would work, for instance, restricting the domain to sets with one-dimensional Zariski closure.

  2. 2.

    What are the sufficient conditions for the uniqueness of the best Chebyshev approximation in terms of the function f only? Can we guarantee that for a given set of points of maximal and minimal deviation there exists a domain X that contains them and a function f for which an optimal solution is unique and has specifically this distribution of points of minimal and maximal deviation?

  3. 3.

    Can we bridge the gap between Lemmas 1 and 3 and show that given a distribution of points of minimal and maximal deviation, for any \(d\in \{0,\dots , \dim S\}\) there exists a function f and domain X with \(\dim Q = d\)? This question is closely related to our discussion at the end of Example 4, where smoothness appears to be important only with relation to the orthogonal direction to the varieties separating the points of maximal and minimal deviation.