1 Introduction

The Müntz approximation theorem, conjectured by Bernstein [4] and proved by Müntz  [12], is a beautiful result. Suppose we are interested in continuous functions on \([0,1]\), i.e., \(f\in C([0,1])\), and we want to approximate them by linear combinations of the monomials

$$\begin{aligned} x^{a_1^{}}, x^{a_2^{}}, \dots , \end{aligned}$$

where \(\{a_k^{}\}\) is a set of exponents (not necessarily integers) satisfying

$$\begin{aligned} 0 = a_0^{}< a_1^{}< a_2^{} < \cdots . \end{aligned}$$

Certainly this is possible if \(a_k^{} = k\) for each k, by the Weierstrass approximation theorem [14], but what if the set of powers is sparser? For example, is the span of the even powers

$$\begin{aligned} 1, x^2, x^4, \dots \end{aligned}$$
(1)

dense in \(C([0,1])\)? The theorem characterizes all suitable sets of exponents:

Theorem 1

(Müntz approximation theorem) The linear span of the family \(\{x^{a_k^{}}\}\) is dense in \(C([0,1])\) if and only if

$$\begin{aligned} \sum _{k=1}^\infty {1\over a_k^{}} = \infty . \end{aligned}$$
(2)

Thus the set (1) easily qualifies, as do many other collections of exponents, such as the primes:

$$\begin{aligned} 1, x^2, x^3, \dots , x^{89}, x^{97}, \dots \end{aligned}$$
(3)

For discussions of the theorem with proofs, see [2, 6, 11]. After 1914, Müntz’s theorem was generalized by Szász and others.

As a numerical analyst, I work with algorithms based on expanding functions in nonorthogonal bases, a powerful technique in certain contexts [1, 10]. This led me to consider Müntz’s theorem from a computational angle, and what emerged is startling. To make the point, it is enough to consider a particular case of what might be regarded as the most basic nontrivial Müntz approximation. The name “E” alludes to the use of even powers.

Problem E. Given \(\varepsilon >0\), find an integer \(n\ge 0\) and coefficients \(c_0^{}, \dots , c_n^{}\) such that

$$\begin{aligned} \Bigl | x - \sum _{k=0}^n c_k^{} x^{2k}\Bigr | \le \varepsilon , \qquad x \in [0,1]. \end{aligned}$$
(4)

We shall prove

Theorem 2

If \(\varepsilon < 1/2\), then any solution of Problem E has

$$\begin{aligned} n > {1\over 20\varepsilon } \end{aligned}$$
(5)

and

$$\begin{aligned} \max _k^{} |c_k^{}|> 0.75 \varepsilon 2^{1/(40 \varepsilon )} . \end{aligned}$$
(6)

Actually, I believe the following sharper bounds hold:

$$\begin{aligned}{} & {} n > {1\over 8\varepsilon }, \end{aligned}$$
(7)
$$\begin{aligned}{} & {} \max _k^{} |c_k^{}|> {(1+\sqrt{2} )^{2n} \over 16n^{1.5}}. \end{aligned}$$
(8)

For accuracy \(\varepsilon = 10^{-6}\), my estimate is that one needs \(n> 140{,}000\) and \(\max _k^{} |c_k^{}|> 10^{107{,}000}\). In such an expansion, the enormous coefficients have oscillating signs, so that they cancel almost exactly (namely to one part in \(10^{107{,}000}\)). On a computer in floating-point arithmetic, all information will be lost unless one works in a precision of more than 107, 000 digits. (The usual precision is 16 digits.)

2 Proof

Problem E is equivalent to the more familiar problem of approximation of |x| on \([-1,1]\):

$$\begin{aligned} \Bigl | |x| - \sum _{k=0}^n c_k^{} x^{2k}\Bigr | \le \varepsilon , \qquad x \in [-1,1]. \end{aligned}$$
(9)

Since Lebesgue first used approximations of |x| for a proof of the Weierstrass approximation theorem at age 23 in 1898, a great deal has been learned about this problem, as recounted in chapter 25 of [14]. In particular, Bernstein’s 1914 paper [5] was a landmark contribution. Among many other things, Bernstein proved that \(\varepsilon \) satisfies

$$\begin{aligned} \varepsilon > {1\over 4(1+\sqrt{2})(2n-1)} \end{aligned}$$
(10)

for any \(n\ge 1\), which implies, since \(4(1+\sqrt{2}) \approx 9.66\),

$$\begin{aligned} n > {1\over 20 \varepsilon }. \end{aligned}$$
(11)

Since \(\varepsilon < 1/2\) in (9) implies \(n\ge 1\), this establishes condition (5) of Theorem 2.

To establish condition (6), we make use of (11). Given \(\varepsilon \), let n and \(\{c_k^{}\}\) define a solution (4) of Problem E. If we split the series into roughly the first quarter and the last three-quarters,then by (11), the first part can approximate |x| no more closely

$$\begin{aligned} \sum _{k=0}^{\lfloor 1/(80\varepsilon )\rfloor } c_k^{} x^{2k} + \sum _{k=1 + \lfloor 1/(80\varepsilon )\rfloor }^n c_k^{} x^{2k} , \end{aligned}$$
(12)

than \(4\varepsilon \). More to our purpose, by a linear scaling, it can approximate |x| over the subinterval \([-1/2,1/2]\) no more closely than \(2\varepsilon \). Therefore, since the sum of the two series in (12) has accuracy better than \(\varepsilon \), the second series must have maximal size at least \(\varepsilon \) over \([-1/2,1/2]\). Since \(|x^{2k}| \le 2^{-2k}\) for \(x\in [-1/2,1/2]\), this implies that there must be some huge coefficients. Specifically, summing a power series involving powers of 4 shows that the second series of (12) is bounded by

$$\begin{aligned} \sum _{k=1 + \lfloor 1/(80\varepsilon )\rfloor }^n | c_k^{} x^{2k}| \le {4\over 3} \max _k^{} |c_k^{}|2^{-2 (\lfloor 1/(80 \varepsilon )\rfloor + 1)}. \end{aligned}$$
(13)

Therefore, we must have

$$\begin{aligned} {4\over 3} \max _k^{} |c_k^{}|2^{-2 (\lfloor 1/(80 \varepsilon )\rfloor + 1)} > \varepsilon , \end{aligned}$$
(14)

that is,

$$\begin{aligned} \max _k^{} |c_k^{}|> 0.75 \varepsilon 2^{2 (\lfloor 1/(80\varepsilon )\rfloor + 1)}. \end{aligned}$$
(15)

This implies (6), completing the proof of Theorem 2. \(\square \)

3 Numerical Estimates

The theorem and proof just given were all about lower bounds, but now let us look at more accurate (though unrigorous) estimates. Bernstein [5] also proved that the best degree \(2n\) maximum-norm approximation errors \(\varepsilon \) satisfy

$$\begin{aligned} \varepsilon \sim {\beta \over 2n}, \quad n\rightarrow \infty \end{aligned}$$
(16)

for some \(\beta \), and in 1985, Varga and Carpenter [16] gave the numerical estimate

$$\begin{aligned} \beta \approx 0.28016949902386913303643649\dots . \end{aligned}$$
(17)

To achieve \(\varepsilon \le 10^{-1}\), \(10^{-2}\), \(10^{-3}\), and \(10^{-4}\), respectively, this suggests (rounding up to the next even numbers) that we will need degrees \(2n\) of approximately 4, 28, 282, and 2802. It turns out that the actual minimal degrees (as computed with the Chebfun minimax command [7, 8]) are exactly these four numbers. For accuracy \(10^{-6}\), for example, though this is beyond Chebfun, it seems clear that the required degree will be close to \(n=280{,}170\).

Thus we see again that an approximation (4) requires degrees of order \(O(1/\varepsilon )\), but why are the coefficients so large? The explanation is that the monomials \(1,x^2, x^4, \dots , x^{2n}\) are an exponentially ill-behaved basis for the space of even degree 2n polynomials on \([-1,1]\). Numerical analysts quantify this observation by noting that the condition number of this set of functions is of the approximate order

$$\begin{aligned} \kappa _{2n} \approx (1+\sqrt{2} )^{2n} \approx 10^{ 0.766 n} \end{aligned}$$
(18)

[3, 9]. With \(2n=280{,}170\) for accuracy \(10^{-6}\), this suggests the expansion coefficients will need to be of order about \(10^{107{,}000}\). Our best empirical approximation based on calculations for n up to 300 is

$$\begin{aligned} \max _k^{} |c_k^{}| \approx 0.066\times {(1+\sqrt{2})^{2n}\over n^{1.5}}. \end{aligned}$$
(19)

Table 1 summarizes our computations and estimates for accuracies \(\varepsilon = 10^{-1}, \dots , 10^{-8}.\)

Table 1 Numerical data for approximation of x by even powers on \([0,1]\), or equivalently, approximation of |x| on \([-1,1]\). The numbers up to 2802 are based on numerical computations, and the remaining ones are estimates

Figure 1 illustrates graphically where the big coefficients come from. For \(2n = 28, 56, \dots , 140\), it plots the coefficients \(|c_k^{}|\) for \(k = 0, 1, \dots , n\) in a monomial expansion of the best approximations.

Fig. 1
figure 1

Monomial coefficients \(|c_k^{}|\) of best approximations of x by even powers on \([0,1]\), or equivalently, of |x| on \([-1,1]\), for \(2n = 28, 56, \dots , 140\). These values correspond to approximation errors approximately \(0.01, 0.01/2, \dots , 0.01/5\). The end of the curve is of order \(2^{2n}\), and the peak in the middle is of order \((1+\sqrt{2})^{2n}\)

4 A Remark About Mathematics

Theorem 2 is startling and interesting. From the usual mathematical point of view, however, it is not much more than that. After all, Müntz’s theorem remains valid and beautiful. From the usual mathematical perspective, Müntz’s theorem expresses a fundamental truth, and Theorem 2, however interesting, is an engineering footnote.

As I have discussed in the context of other problems [13, 15], I believe this usual perspective is too comfortable. Theorem 2 implies that typical sets of powers deemed useful by Müntz’s theorem would in fact be useless in any actual application. If it is not the business of mathematicians to notice and analyze such an effect, then whose business is it?