Optimal Polynomial Prediction Measures and Extremal Polynomial Growth

We show that the problem of finding the measure supported on a compact subset K of the complex plane such that the variance of the least squares predictor by polynomials of degree at most n at a point exterior to K is a minimum, is equivalent to the problem of finding the polynomial of degree at most n, bounded by 1 on K with extremal growth at this external point. We use this to find the polynomials of extremal growth for the interval [-1,1] at a purely imaginary point. The related problem on the extremal growth of real polynomials was studied by Erd\H{o}s in 1947.


Introduction
In this work we consider two classical extremum problems for polynomials. The first is very easy to state. Indeed, let us denote the complex polynomials of degree at most n in d complex variables by C n [z], z ∈ C d . Then for K ⊂ C d compact and z 0 ∈ C d \K an external point, we say that P n (z) ∈ C n [z] has extremal growth relative to K at z 0 if the sup-norm P n K ≤ 1 and We note that for this to be well-defined we require that K be polynomial determining, i.e., if p ∈ C[z] is such that p(x) = 0 for all x ∈ K, then p = 0. We refer the interested reader to the survey [2] for more about what is known about this problem.
The second problem is from the field of Optimal Design for Polynomial Regression. To describe it we reduce to the real case K ⊂ R d , and note that we may write any p ∈ R n [z] in the form Suppose now that we observe the values of a particular p ∈ R n [z] at a set of m ≥ N points X := {x j : 1 ≤ j ≤ m} ⊂ K with some random errors, i.e., we observe where we assume that the errors j ∼ N (0, σ) are independent. In matrix form this becomes y = V n θ + where θ ∈ R N , y, ∈ R m and is the associated Vandermonde matrix.
Our assumption on the error vector means that cov( ) = σ 2 I m ∈ R m×m . Now, the least squares estimate of θ is θ := (V t n V n ) −1 V t n y.
Note that the entries of 1 m V t n V n are the discrete inner products of the p i with respect to the measure More specifically, is the Moment, or Gram, matrix of the polynomials p i with respect to the measure µ.
In general we may consider arbitrary probability measures on K, setting M(K) := {µ : µ is a probability measure on K}.

Now set
then the least squares estimate of the observed polynomial is We may compute its variance at any point z ∈ R d to be where µ X is again given by (2). Now, it is easy to verify that for any µ ∈ M(K), p t (z)(G n (µ)) −1 p(z) = K µ n (z, z) where, for {q 1 , · · · , q N } ⊂ R n [z], a µ-orthonormal basis for R n [z], is the Bergman kernel for R n [z]. The function K µ n (z, z) is also known as the (reciprocal of) the Christoffel function for R n [z].
We may generalize easily to the complex case, K ⊂ C d , where now the p j form a basis for C n [z] and For an external point z 0 ∈ C d \K, a measure µ 0 ∈ M(K) is said to be an optimal prediction (or extrapolation) measure for z 0 relative to K if it minimizes the complex analogue of the variance (5) of the polynomial predictor at z 0 , i.e., if In [5] Hoel and Levine show that in the univariate case, for K = [−1, 1], and any z 0 ∈ R\K, a real external point, the optimal prediction measure is a discrete measure supported at the n + 1 extremal points x k = cos(kπ/n), 0 ≤ k ≤ n, of T n (x) the classical Chebyshev polynomial of the first kind (cf. Lemma 3.1 below). In this case it turns out that Notably, as is well known, T n (x) is the polynomial of extremal growth for any point z 0 ∈ R\[−1, 1] relative to K = [−1, 1]. Also, Erdős (1947) [3] has shown that the Chebyshev polynomial is also extreme relative to [−1, 1] for real polynomials at points z 0 ∈ C with |z 0 | ≥ 1, i.e., The problem for real polynomials and |z 0 | ≤ 1 or for complex polynomials p ∈ C[z] has remained unsolved up to now.
We show in Section 2 that (8) is not an accident, and that there is a general equivalence of our two extremum problems. In Section 3 we will use this to compute the polynomials of extremal growth and the optimal prediction measures for a purely imaginary complex point z 0 ∈ C\[−1, 1].

A Kiefer-Wolfowitz Type Equivalence Theorem
Kiefer and Wolfowitz [6] have given a remarkable equivalence between what are called D-optimal and G-optimal designs, i.e., probability measures that maximize the determinant of the design matrix G n (µ) and those which minimize the maximum over x interior to K, of the prediction variance i.e., minimize max x∈K K µ n (x, x). Here we give an analogous equivalence, for a single exterior point z 0 ∈ C d \K, with the problem of extremal polynomial growth.
Of importance will be the well-known variational form of the Christoffel function: Indeed, from this variational form, the problem of minimal variance (7) may be expressed as |p(z 0 )| 2 K |p(z)| 2 dµ which, as it turns out, we will be able to analyze using the classical Minimax Theorem (see e.g. Gamelin [4, Thm. 7.1, Ch. II]). To see this, note that we may first of all simplify to It is easy to confirm that f is quasiconcave in µ and quasiconvex in p and hence by the Minimax Theorem However, as µ = δ x ∈ M(K) for every x ∈ K, it follows that Consequently, the minimum variance is given by i.e., the value squared of the polynomial of maximal growth at z 0 . The Minimax theorem in a similar context has been used before to get pointwise estimates of solutions to the∂-equation by Berndtsson in [1, p. 206].
It is also possible to give a more precise relation between the extremal polynomials for the two problems (of minimum variance and extremal growth), using completely elementary means, along the lines of the proof of the original Kiefer-Wolfowitz theorem.
We begin with a simple technical lemma.
Then if P µ,z 0 n K ≤ 1 it is a polynomial of degree at most n of extremal growth at z 0 relative to K. Here p * denotes the conjugate transpose of the vector p.
Proof. Suppose that p ∈ C n [z] is such that p K ≤ 1. Then, from the variational form (9), Theorem 2.2 Suppose that µ 0 ∈ M(K) and define P µ 0 ,z 0 n as in (10). Then µ 0 is an extremal prediction measure for z 0 relative to K if and only if P µ 0 ,z 0 n (z) ∈ C n [z] is a polynomial of extremal growth at z 0 relative to K (i.e., P µ 0 ,z 0 n K ≤ 1, by Lemma 2.1).
Proof. Suppose first that P µ 0 ,z 0 n K ≤ 1. We will show that then µ 0 is an optimal prediction measure. Indeed, consider any other measure µ ∈ M(K). Then . Conversely, suppose that µ 0 is an extremal prediction measure. We must show that then P µ 0 ,z 0 n K ≤ 1. To see this, fix µ 1 ∈ M(K) and consider the family of measures dµ t : . Now, for a fixed a ∈ K, take µ 1 = δ a , the Dirac delta measure supported at a. In this case Hence, using (10), If µ 0 is to minimize K µt n (z 0 , z 0 ) then each of the above derivatives must be greater than or equal to zero, i.e., Remark. It is easily confirmed that K |P µ,z 0 n (z)| 2 dµ(z) = 1. Hence, for an optimal prediction measure µ 0 , it must be the case that |P µ 0 ,z 0 n (z)| ≡ 1 on the support of µ 0 . Consequently optimal prediction measures are always supported on a real algebraic subset of K of degree 2n.

A Complex Point External to [−1, 1]
We now consider K = [−1, 1] ⊂ C and z 0 ∈ C\K. First note that by the above remark any optimal prediction measure must be supported on discrete points, x 0 := −1, x n := +1 and n − 1 internal points −1 < x 1 < · · · < x n−1 < 1, i.e., is of the form Given the support points x i there is a simple recipe for the optimal weights, given already in [5].
Lemma 3.1 (Hoel-Levine) Suppose that −1 = x 0 < x 1 < · · · < x n = +1 are given. Then among all discrete probability measures supported at these points, the measure with with i (z) the ith fundamental Lagrange interpolating polynomial for these points, minimizes K µ n (z 0 , z 0 ).
Proof. We first note that for such a discrete measure, form an orthonormal basis. Hence In the case of the weights chosen according to (11) we obtain We claim that for any choice of weights K n given by (12) is at least as large as that given by (13). To see this, just note that by the Cauchy-Schwartz inequality, Remark. We note that the optimal K n (z 0 , z 0 ) given by (13) is the Lebesgue function squared. Hence the problem of finding the support of the optimal prediction measure amounts to finding the n + 1 interpolation points −1 = x 0 < x 1 < · · · < x n = +1 for which the Lebesgue function evaluated at the external point z 0 , is as small as possible.
In this case the extremal polynomial P µ,z 0 n (z) also simplifies.
Proof. Using again the fact that { i (z)/ √ w i } 0≤i≤n form a set of orthonormal polynomials, we have Remark. By the equivalence Theorem 2.2 the support of the optimal prediction measure and the polynomial of extremal growth will be given by those points −1 = x 0 < x 1 < · · · < x n = +1 for which

A Purely Imaginary Point External to [−1, 1]
In the case of z 0 = ai, 0 = a ∈ R, a purely imaginary point, it turns out that there are remarkable formulas for the polynomial of extremal growth as well as for the support of the optimal prediction measure. Both of these will depend on the point z 0 (as opposed to the real case z 0 ∈ R\[−1, 1] where Hoel and Levine [5] showed that the support is always the set of extreme points of the Chebyshev polynomial T n (x)).
To begin we will first analyze the degrees n = 1 and n = 2 cases.

Degree n = 1
Here the support of the extremal measure is necessarily x = −1 and x 1 = +1. We will compute P µ 0 ,z 0 1 (z) using the formula given in Lemma 3.2. Indeed in this case, 0 (z) = (1 − z)/2 and 1 (z) = (1 + z)/2 so that Hence, Since ±1 is necessarily the support of the optimal prediction measure it is immediate that P µ 0 ,z 0 1 [−1,1] = 1, as is also easily verified by a simple direct calculation.

Degree n = 2
We claim that the support of the optimal prediction measure is x 0 = −1, x 1 = 0 and x 2 = +1. However, this is not automatic and we will have to verify that the norm of P µ 0 ,z 0 2 is indeed 1. Now, it is easy to see, for this support, that sgn( 1 (ia)) = sgn(1 + a 2 ) = +1, and, after a simple calculation, From this we may easily conclude that The fact that P µ 0 ,z 0 2 [−1,1] = 1 is an immediate consequence of the following lemma.
Proof. This follows from elementary calculations starting with the formula for P µ 0 ,z 0 2 (x) given above.
We now define a sequence of polynomials, Q n (z), based on the above degrees n = 1 and n = 2 cases, for which we will show that Q n (z) = c n P µ 0 ,z 0 n (z) for certain c n ∈ C with modulus |c n | = 1. We will also define a sequence of polynomials R n (x) which will play the role of R 1 (x) in the Lemma for general degree n. Now, as the formula for P µ 0 ,z 0 2 depends on the sign of a, in order to simplify the formulas we will assume that a > 0. For a < 0, one may use the relation P µ 0 ,ia 2 (z) = P µ 0 ,−ia 2 (−z).

Definition 4.2
For a > 0 we define the sequences of polynomials Q n (z) and R n (z) by and R 0 (z) = a √ a 2 + 1 , Since the recursions are both those of the classical Chebyshev polynomials it is not surprising that there are formulas for Q n (z) and R n (z) in terms of these.

Lemma 4.3 We have
where T n (z) is Chebyshev polynomial of the first kind and U n (z) := 1 n+1 T n+1 (z) that of the second kind.
Proof. Let q n (z) denote the right side of the proposed identity. We proceed by induction. For n = 1 we have Similarly, for n = 2 we have The result now follows easily from the fact that both kinds of Chebyshev polynomials satisfy the same recursion as used in the definition of Q n (z).

Lemma 4.4 We have
Proof. let r n (z) denote the right side of the proposed identity. We again proceed by induction. For n = 0 we have Similarly, for n = 1 we have The result now follows easily from the fact that both kinds of Chebyshev polynomials satisfy the same recursion as used in the definition of R n (z). Now, just for the Chebyshev polynomials T n (z) and U n−1 (z) there is the Pell identity We will show that for real z ∈ R, the polynomials Q n (z) and R n−1 (z) satisfy a similar Pell identity.
Proof. By Lemma 4.3, z = x ∈ R, we may write Hence, using the Chebyshev Pell identity (14), From the Pell identity we immediately have Indeed, we claim that the endpoints together with the zeros of R n−1 (x) form the support of the optimal prediction measure. To this end we first prove that R n−1 (x) has n − 1 zeros in (−1, 1).
The result follows.
Our calculations will make use of the elementary facts that T n (ai) = (i) n 2 (a + √ a 2 + 1) n + (a − √ a 2 + 1) n , The endpoints are the easiest case and so we will begin with those. Specifically, for k = 0, x 0 = −1, .
On the other hand as is easily verified.
The other endpoint x n = +1 is very similar and so we suppress the details.
On the other hand, from the formula for R n−1 (x) given in Lemma 4.4, we see that R n−1 (x k ) = 0 implies that Substituting this into the formula for Q n given in Lemma 4.3 we obtain But by the Pell identity of Proposition 4.5, |Q n (x k )| = 1 and so we must have Q n (x k ) = a + ix k a 2 + x 2 k sgn (U n−2 (x k )) .
But, as the zeros of R n−1 interlace the extreme points T n−1 , i.e., the zeros of U n−2 , it is easy to check that sgn(U n−2 (x k ) = (−1) n−1−k . In other words, Q n (x k ) = (−1) n−1−k a + ix k a 2 + x 2 k which is easily verified to equal −(i) n sgn( k (ai)), as claimed.

Hence we have
It is worth noting that the extremal polynomial and optimal measure, unlike the real case depend on the exterior point z 0 . Moreover, this extreme value is rather larger than |T n (ai)|. Indeed it is easy to show that √ a 2 + 1(|a| + √ a 2 + 1) n−1 − |T n (ai)| = ( √ a 2 + 1 − |a|)|T n−1 (ai)|.
One may of course wonder if there are similar formulas for general points z 0 ∈ C\[−1, 1] (not just z 0 = ai). However numerical experiments seem to indicate that in general there is no three-term recurrence for the extremal polynomials.