1 Introduction

An uncertainty principle expresses a fundamental tradeoff between the properties of a function f and its Fourier transform \(\widehat{f}\). The most common variants measure the dispersion, with the tradeoff being that f and \(\widehat{f}\) cannot both be highly concentrated near the origin. Motivated by applications to number theory, Bourgain, Clozel, and Kahane [2] proved an elegant uncertainty principle for the signs of f and \(\widehat{f}\): if these functions are nonpositive at the origin and not identically zero, then they cannot both be nonnegative outside an arbitrarily small neighborhood of the origin. We can state this principle more formally as follows.

We say that a function \(f:\mathbb {R}^d \rightarrow \mathbb {R}\) is eventually nonnegative (resp., nonpositive) if \(f(x)\ge 0\) (resp., \(f(x)\le 0\)) for all sufficiently large |x|. If that is the case, we let

$$\begin{aligned} r(f)=\inf {} \{R \ge 0: f(x)\text { has the same sign for }|x|\ge R\} \end{aligned}$$

be the radius of its last sign change. We normalize the Fourier transform \(\widehat{f}\) of f by

$$\begin{aligned} \widehat{f}(\xi ) = \int _{\mathbb {R}^d} f(x)e^{-2\pi i \langle x, \xi \rangle }\, dx. \end{aligned}$$

Let \(\mathcal {A}_+(d)\) denote the set of functions \(f:\mathbb {R}^d\rightarrow \mathbb {R}\) such that

  1. (1)

    \(f\in L^1(\mathbb {R}^d)\), \(\widehat{f}\in L^1(\mathbb {R}^d)\), and \(\widehat{f}\) is real-valued (i.e., f is even),

  2. (2)

    f is eventually nonnegative while \(\widehat{f}(0)\le 0\), and

  3. (3)

    \(\widehat{f}\) is eventually nonnegative while \(f(0)\le 0\).

(Note the tension in (2) between the eventual nonnegativity of f and the inequality \(\int _{\mathbb {R}^n} f = \widehat{f}(0) \le 0\), and the analogous tension in (3).)

The uncertainty principle of Bourgain, Clozel, and Kahane from [2, Théorème 3.1] says that

$$\begin{aligned} \mathrm {A}_+(d) := \inf _{f\in \mathcal {A}_+(d){\setminus }\{0\}} \sqrt{r(f)r(\widehat{f}\,)} >0. \end{aligned}$$

Taking the geometric mean of r(f) and \(r(\widehat{f}\,)\) is a natural way to eliminate scale dependence, because rescaling the input of f preserves this quantity. Thus, the uncertainty principle amounts to saying that r(f) and \(r(\widehat{f}\,)\) cannot both be made arbitrarily small if \(f \in \mathcal {A}_+(d){\setminus }\{0\}\).

In [Theorem 3, [12]], Gonçalves, Oliveira e Silva, and Steinerberger proved that for each dimension d there exists a radial function \(f\in \mathcal {A}_+(d) {\setminus }\{0\}\) such that \(f = \widehat{f}\) and

$$\begin{aligned} r(f) = r(\widehat{f}\,) = \mathrm {A}_+(d); \end{aligned}$$

furthermore, \(\mathrm {A}_+(d)\) is exactly the minimal value of r(g) in the following optimization problem:

Problem 1.1

(\(+\,1\) eigenfunction uncertainty principle). Minimize r(g) over all \(g :\mathbb {R}^d \rightarrow \mathbb {R}\) such that

  1. (1)

    \(g\in L^1(\mathbb {R}^d){\setminus }\{0\}\) and \(\widehat{g} = g\), and

  2. (2)

    \(g(0)=0\) and g is eventually nonnegative.

The name “\(+\,1\) eigenfunction” refers to the fact that g is a eigenfunction of the Fourier transform with eigenvalue \(+\,1\).

Upper and lower bounds for \(\mathrm {A}_+(d)\) are known [2, 12], but the exact value has not previously been determined, or even conjectured, in any dimension. Our main result is a solution of this problem in twelve dimensions:

Theorem 1.2

We have \(\mathrm {A}_+(12)=\sqrt{2}\). In particular, there exists a radial Schwartz function \(f:\mathbb {R}^{12} \rightarrow \mathbb {R}\) that is eventually nonnegative and satisfies \(\widehat{f}=f\), \(f(0)=0\), and

$$\begin{aligned} r(f) = r(\widehat{f}\,) = \sqrt{2}. \end{aligned}$$

Moreover, as a radial function f has a double root at \(|x|=0\), a single root at \(|x|=\sqrt{2}\), and double roots at \(|x|=\sqrt{2j}\) for integers \(j \ge 2\).

See Fig. 1 for plots. The appealing simplicity of this answer seems to be unique to twelve dimensions, and we have been unable to conjecture a closed form for \(\mathrm {A}_+(d)\) in any other dimension d. See Sect. 4 for an account of the numerical evidence, which displays noteworthy patterns and regularity despite the lack of any exact conjectures.

We find the exceptional role of twelve dimensions surprising: why should a seemingly arbitrary dimension admit an exact solution with mysterious arithmetic structure not shared by other dimensions? As far as we are aware, Theorem 1.2 is the first time such behavior has arisen in an uncertainty principle.

Fig. 1
figure 1

Two plots of the function f from Theorem 1.2. The upper image is a cross section of the graph of \(x \mapsto f(x)\) for \(|x|^2 \le 8\); note that this function decreases rapidly enough that the double roots are nearly invisible. The function in the lower image is instead proportional to \(x \mapsto |x|^{11} f(x)\). This transformation distorts the picture but clarifies the behavior, because \(|x|^{11}\) is proportional to the surface area of a sphere of radius |x| in \(\mathbb {R}^{12}\); thus, one-dimensional integrals of the plotted function are proportional to integrals of f in \(\mathbb {R}^{12}\).

The proof of Theorem 1.2 makes use of modular forms. The lower bound \(\mathrm {A}_+(12) \ge \sqrt{2}\) follows from the existence of the Eisenstein series \(E_6\), while the upper bound \(\mathrm {A}_+(12) \le \sqrt{2}\) is based on Viazovska’s methods, which were developed to solve the sphere packing problem in eight dimensions [18] and twenty-four dimensions [8] (see also [4] for an exposition). We prove both bounds for \(\mathrm {A}_+(12)\) in Sect. 2.

The close relationship of this uncertainty principle with sphere packing may seem surprising, given that Problem 1.1 makes no reference to any discrete structures. The connection is through the Euclidean linear programming bound of Cohn and Elkies [5], which converts a suitable auxiliary function f into an upper bound for the sphere packing density \(\Delta _d\) in \(\mathbb {R}^d\). Suppose \(f :\mathbb {R}^d \rightarrow \mathbb {R}\) is an integrable function such that \(\widehat{f}\) is also integrable and real-valued (i.e., f is even), \(f(0) = \widehat{f}(0) = 1\), \(\widehat{f} \ge 0\) everywhere, and f is eventually nonpositive. Then the linear programming bound obtained from f is the upper bound

$$\begin{aligned} \Delta _d \le {{\,\mathrm{\mathrm {vol}}\,}}\mathopen {}\big (B^d_{r(f)/2}\big )\mathclose {}, \end{aligned}$$
(1.1)

where \(B^d_R\) is the closed ball of radius R about the origin in \(\mathbb {R}^d\). (Strictly speaking, the proof in [5] requires additional decay hypotheses on f and \(\widehat{f}\); see [11, Theorem 3.3] for a proof in the generality of our statement here.) Optimizing this bound amounts to minimizing r(f).

Based on numerical evidence and analogies with other problems in coding theory, Cohn and Elkies conjectured the existence of functions f achieving equality in (1.1) when \(d \in \{2,8,24\}\), and they proved it when \(d=1\). The case \(d=2\) remains an open problem today, despite the existence of elementary solutions of the two-dimensional sphere packing problem by other means (see, for example, [13]). However, the case \(d=8\) was proved fourteen years later in a breakthrough by Viazovska [18], and the case \(d=24\) was proved shortly thereafter based on her approach [8]. These papers solved the sphere packing problem in dimensions 8 and 24.

The problem of optimizing the linear programming bound for \(\Delta _d\) already appears somewhat similar to Problem 1.1, but there is a deeper analogy based on a problem studied by Cohn and Elkies in [5, Section 7]. Given an auxiliary function f for the sphere packing bound, let \(g = \widehat{f} - f\). Note that g is not identically zero, because otherwise f and \(\widehat{f}\) would both have compact support (thanks to their opposite signs outside radius r(f)), which would imply that \(f = \widehat{f} = 0\). Then g satisfies the conditions of the following problem, with \(r(g) \le r(f)\):

Problem 1.3

(\(-1\) eigenfunction uncertainty principle). Minimize r(g) over all \(g :\mathbb {R}^d \rightarrow \mathbb {R}\) such that

  1. (1)

    \(g\in L^1(\mathbb {R}^d){\setminus }\{0\}\) and \(\widehat{g} = -g\), and

  2. (2)

    \(g(0)=0\) and g is eventually nonnegative.

This problem has been solved for \(d \in \{1, 8, 24\}\), as a consequence of the sphere packing bounds mentioned above; the answers are 1, \(\sqrt{2}\), and 2, respectively. When \(d=2\), it is conjectured that the optimal value of r(g) is \((4/3)^{1/4}\), but no proof is known. No other closed forms have been identified.

Cohn and Elkies conjectured [5, Conjecture 7.2] that the minimal value of r(g) in Problem 1.3 is exactly the same as that of r(f) in the linear programming bound, and that in fact an auxiliary function f for the linear programming bound can always be reconstructed from an optimal g via \(g = \widehat{f} - f\). Nobody has proved that such an f always exists, but numerical evidence strongly supports this conjecture.

We can extend Problem 1.3 to a broader uncertainty principle as follows. Let \(\mathcal {A}_-(d)\) denote the set of functions \(f:\mathbb {R}^d\rightarrow \mathbb {R}\) such that

  1. (1)

    \(f\in L^1(\mathbb {R}^d)\), \(\widehat{f}\in L^1(\mathbb {R}^d)\), and \(\widehat{f}\) is real-valued (i.e., f is even),

  2. (2)

    f is eventually nonnegative while \(\widehat{f}(0)\le 0\), and

  3. (3)

    \(\widehat{f}\) is eventually nonpositive while \(f(0)\ge 0\).

Let

$$\begin{aligned} \mathrm {A}_-(d) = \inf _{f\in \mathcal {A}_-(d){\setminus }\{0\}} \sqrt{r(f)r(\widehat{f}\,)}, \end{aligned}$$

and note that every function g in Problem 1.3 satisfies \(r(g) \ge \mathrm {A}_-(d)\).

For completeness, we state our next theorem for both \(\pm 1\) cases, although all the results in the following theorem were already proved for the \(+\,1\) case by Gonçalves, Oliveira e Silva, and Steinerberger in [12]. Note that we regard \(\mathrm {A}_{+\,1}\) and \(\mathrm {A}_{-1}\) as synonymous with \(\mathrm {A}_+\) and \(\mathrm {A}_-\), respectively.

Theorem 1.4

Let \(s\in \{\pm 1\}\). Then there exist positive constants c and C such that

$$\begin{aligned} c \le \frac{\mathrm {A}_s(d)}{\sqrt{d}} \le C \end{aligned}$$

for all d. Moreover, for each d there exists a radial function \(f\in \mathcal {A}_s(d){\setminus }\{0\}\) with \(\widehat{f} = s f\), \(f(0)=0\), and

$$\begin{aligned} r(f) = \mathrm {A}_s(d). \end{aligned}$$

Furthermore, any such function must vanish at infinitely many radii greater than \(\mathrm {A}_s(d)\).

In particular, \(\mathrm {A}_-(d) > 0\). Thus, we obtain a natural counterpart to the uncertainty principle of Bourgain, Clozel, and Kahane, but with f and \(\widehat{f}\) having opposite signs, and with the optimal function coming from Problem 1.3. We can take \(c = 1/\sqrt{2\pi e}\) and \(C=1\).

This uncertainty principle places the linear programming bound in a broader analytic context and gives a deeper significance to the auxiliary functions that optimize this bound. Outside of a few exceptional dimensions, they do not seem to come close to solving the sphere packing problem, but they conjecturally achieve an optimal tradeoff between sign conditions in the uncertainty principle.

Except for extremal functions for \(\mathrm {A}_+(1)\), our proof in Sect. 3.3 and the proof in [12] actually show that any extremal function cannot be eventually positive; that is, it must vanish on spheres with arbitrarily large radii, not just at infinitely many radii greater than \(\mathrm {A}_s(d)\). We strongly believe that this is the case for \(\mathrm {A}_+(1)\) as well.

Problems 1.1 and 1.3 are closely related and behave in complementary ways. We prove Theorem 1.4 by adapting the techniques of [12] to \(-\,1\) eigenfunctions. However, the analogy between these problems is not perfect. For example, the equality \(\mathrm {A}_+(12)=\mathrm {A}_-(8)=\sqrt{2}\) suggests that perhaps \(\mathrm {A}_+(28) = \mathrm {A}_-(24) = 2\), but that turns out to be false (see Sect. 4). Similarly, relatively simple explicit formulas show that \(\mathrm {A}_-(1)=1\), while \(\mathrm {A}_+(1)\) remains a mystery.

In addition to its values in specific dimensions, the asymptotic behavior of \(\mathrm {A}_s(d)\) as \(d \rightarrow \infty \) is of substantial interest. It was shown in [2] that

$$\begin{aligned} 0.2419\cdots = \frac{1}{\sqrt{2\pi e}}\le & {} \liminf _{d\rightarrow \infty } \frac{\mathrm {A}_+(d)}{\sqrt{d}}\\\le & {} \limsup _{d\rightarrow \infty } \frac{\mathrm {A}_+(d)}{\sqrt{d}} \le \frac{1}{\sqrt{2\pi }} = 0.3989\ldots . \end{aligned}$$

In Sect. 3, we obtain the same lower bound for the case of \(\mathrm {A}_-(d)\), and an improved upper bound of \(0.3194\ldots \) for that case based on [11] (the exact value is complicated).

Conjecture 1.5

The limits

$$\begin{aligned} \lim _{d \rightarrow \infty } \frac{\mathrm {A}_+(d)}{\sqrt{d}} \quad \text {and}\quad \lim _{d \rightarrow \infty } \frac{\mathrm {A}_-(d)}{\sqrt{d}} \end{aligned}$$

exist and are equal.

See Sect. 4 for the numerical evidence supporting this conjecture. We expect that the common value of these limits is strictly between the bounds \(0.2419\ldots \) and \(0.3194\cdots \), and perhaps not so far from the latter.

In the remainder of the paper, we prove Theorem 1.2 in Sect. 2 and Theorem 1.4 in Sect. 3. In Sect. 4 we present numerical computations and conjectures, and we conclude in Sect. 5 with a construction of summation formulas that validate our numerics and lend support to our general conjectures about \(\mathrm {A}_s(d)\).

2 The \(+\,1\) eigenfunction uncertainty principle in dimension 12

In this section, we prove Theorem 1.2.

2.1 Optimality

We begin by establishing that \(\mathrm {A}_+(12) \ge \sqrt{2}\). For this inequality, we use a special Poisson-type summation formula for radial Schwartz functions \(f:\mathbb {R}^{12}\rightarrow \mathbb {C}\) based on the modular form \(E_6\). Converting a modular form into such a formula is a standard technique; for completeness, we will give a direct proof.

Consider the normalized Eisenstein series \(E_6 :\mathcal {H}\rightarrow \mathbb {C}\), where \(\mathcal {H}\) denotes the upper half-plane in \(\mathbb {C}\) (see, for example, [19, §2]). This function has the Fourier expansion

$$\begin{aligned} E_6(z) = 1-\sum _{j\ge 1} c_j e^{2\pi i j z}, \end{aligned}$$
(2.1)

where \(c_j{\!} = 504\sigma _{5}(j)\) and \(\sigma _5(j)\) is the sum of the fifth powers of the divisors of j. In particular, \(c_j{\!} >0\) for \(j\ge 1\) and we have the trivial bound \(c_j{\!}\le 504j^6\). Because \(E_6\) is a modular form of weight 6 for \(\mathrm {SL}_2(\mathbb {Z})\), it satisfies the identity

$$\begin{aligned} E_6(z) = z^{-6}E_6(-1/z). \end{aligned}$$
(2.2)

This identity turns into a summation formula for a Gaussian \(f :\mathbb {R}^{12} \rightarrow \mathbb {R}\) defined by \(f(x) = e^{-\pi \alpha |x|^2}\) with \(\alpha >0\), or more generally \(\mathop {\mathrm {Re}}(\alpha ) > 0\). Specifically, if we set \(z = i \alpha \), then \(f(x) = e^{\pi i z |x|^2}\) and \(\widehat{f}(\xi ) = -z^{-6} e^{\pi i(-1/z)|\xi |^2}\), from which it follows that \(f(\sqrt{2j}) = e^{2\pi i j z}\) and \(\widehat{f}(\sqrt{2j}) = -z^{-6} e^{2\pi i j (-1/z)}\), where we use \(f(\sqrt{2j})\) to denote the common value f(x) with \(|x|=\sqrt{2j}\). Hence combining (2.1) and (2.2) yields

$$\begin{aligned} f(0) - \sum _{j\ge 1} c_j{\!} f(\sqrt{2j}) = - \widehat{f}(0) + \sum _{j\ge 1} c_j{\!} \widehat{f}(\sqrt{2j}). \end{aligned}$$

The key to proving that \(\mathrm {A}_+(12) \ge \sqrt{2}\) is the following lemma, which extends this summation formula to arbitrary radial Schwartz functions.

Lemma 2.1

For all radial Schwartz functions \(f:\mathbb {R}^{12}\rightarrow \mathbb {C}\),

$$\begin{aligned} f(0) - \sum _{j\ge 1} c_j{\!} f(\sqrt{2j}) = - \widehat{f}(0) + \sum _{j\ge 1} c_j{\!} \widehat{f}(\sqrt{2j}). \end{aligned}$$

We follow the approach used to prove Theorem 1 in [16, Section 6].

Proof

Let \(\Lambda :\mathcal {S}_{\text {rad}}(\mathbb {R}^{12})\rightarrow \mathbb {C}\) be the functional

$$\begin{aligned} \Lambda (f) = f(0) - \sum _{j\ge 1} c_j{\!} f(\sqrt{2j}) + \widehat{f}(0) - \sum _{j\ge 1} c_j{\!} \widehat{f}(\sqrt{2j}) \end{aligned}$$

on the radial Schwartz space \(\mathcal {S}_{\text {rad}}(\mathbb {R}^{12})\). As noted above, \(\Lambda (f)=0\) whenever \(f(x) = e^{-\pi \alpha |x|^2}\) with \(\mathop {\mathrm {Re}}(\alpha ) > 0\). Moreover, the bound \(c_j{\!} =O(j^6)\) shows that \(\Lambda \) is a continuous linear functional in the topology of the Schwartz space. Thus, we need only prove our desired identity for compactly supported, radial \(C^\infty \) functions, which are dense in \(\mathcal {S}_{\text {rad}}(\mathbb {R}^{12})\).

Write \(f(x)=F(|x|^2)e^{-\pi |x|^2}\), where \(F:\mathbb {R}\rightarrow \mathbb {R}\) is a smooth and compactly supported function. Let \(\widehat{F}\) be the one-dimensional Fourier transform of F, and note that \(\widehat{F}\) is also rapidly decreasing. By Fourier inversion,

$$\begin{aligned} f(x) = \int _\mathbb {R}\widehat{F}(t)e^{-\pi (1-2i t) |x|^2}\,dt = \lim _{T\rightarrow \infty } \int _{-T}^T \widehat{F}(t)e^{-\pi (1-2i t) |x|^2}\,dt. \end{aligned}$$

The functions \(x\mapsto \int _{-T}^T \widehat{F}(t)e^{-\pi (1-2i t) |x|^2}\,dt\) belong to \(\mathcal {S}_{\text {rad}}(\mathbb {R}^{12})\) for each \(T>0\) and converge to f in the Schwartz topology. Moreover,

$$\begin{aligned}&\Lambda \mathopen {}\left( x \mapsto \int _{-T}^T \widehat{F}(t)e^{-\pi (1-2i t) |x|^2}\,dt\right) \mathclose {} \\&\quad = \int _{-T}^T \widehat{F}(t)\Lambda \mathopen {}\left( x \mapsto e^{-\pi (1-2i t) |x|^2}\right) \mathclose {}\,dt = 0, \end{aligned}$$

where the commutation is justified since the Riemann sums of the integral converge to the integral in the topology of \(\mathcal {S}_{\text {rad}}(\mathbb {R}^{12})\). This finishes the proof of the lemma. \(\square \)

Noam Elkies has provided the following alternative proof of Lemma 2.1 using Poisson summation. Explicit calculation shows that one can write the modular form \(E_6\) in terms of theta series of lattices and their duals as

$$\begin{aligned} E_6 = -\frac{11}{10} \Theta _{D_{12}} + \frac{11}{20} \Theta _{D^*_{12}} - \frac{1}{20} \Theta _{L} + \frac{8}{5} \Theta _{L^*}, \end{aligned}$$

where L is the \(D_{12}\) root lattice rescaled by a factor of \(1/\sqrt{2}\). Then the summation formula from Lemma 2.1 becomes a linear combination of the Poisson summation formulas for the lattices \(D_{12}\), \(D_{12}^*\), L, and \(L^*\), which implies that it holds for all radial Schwartz functions. This argument shows that Lemma 2.1 is closely related to Poisson summation, while the proof we gave above applies directly to other modular forms as well as \(E_6\).

Lemma 2.2

Let \(f \in \mathcal {A}_+(12)\). If both r(f) and \(r(\widehat{f}\,)\) are at most \(\sqrt{2}\), then \(f(x) = \widehat{f}(x) = 0\) whenever \(|x| = \sqrt{2j}\) with j a nonnegative integer.

Proof

Without loss of generality, we can assume f is a radial function; otherwise, we simply average its rotations about the origin. (If the averaged function vanishes at radius \(\sqrt{2j}\), then so does f because \(r(f) \le \sqrt{2}\), and the same holds for \(\widehat{f}\).)

If f is a radial Schwartz function, then Lemma 2.1 implies that

$$\begin{aligned} f(0) + \widehat{f}(0) = \sum _{j\ge 1} c_j{\!} f(\sqrt{2j}) + \sum _{j\ge 1} c_j{\!} \widehat{f}(\sqrt{2j}), \end{aligned}$$

and the conclusion follows from the inequalities \(f(0) \le 0\), \(\widehat{f}(0) \le 0\), \(f(\sqrt{2j}) \ge 0\), \(\widehat{f}(\sqrt{2j}) \ge 0\), and \(c_j{\!} > 0\) for \(j \ge 1\).

For general f, we can apply a standard mollification argument. Let \(\varphi :\mathbb {R}^d \rightarrow \mathbb {R}\) be a nonnegative, radial \(C^\infty \) function supported in the unit ball \(B^{d}_1\) with \(\widehat{\varphi }\ge 0\) and \(\widehat{\varphi }(0)=1\), so that the functions \(\varphi _\varepsilon \) defined for \(\varepsilon >0\) by \(\varphi _\varepsilon (x) = \varepsilon ^{-d}\varphi (x/\varepsilon )\) form an approximate identity.

Now let \(f_\varepsilon = (f * \varphi _\varepsilon ) \widehat{\varphi }_\varepsilon \). Because f and \(\widehat{f}\) are continuous functions that vanish at infinity, \(f_\varepsilon \rightarrow f\) and \(\widehat{f}_\varepsilon \rightarrow \widehat{f}\) uniformly on \(\mathbb {R}^d\) as \(\varepsilon \rightarrow 0\). Since \({{\,\mathrm{\mathrm {supp}}\,}}(\varphi _\varepsilon ) \subseteq B^{d}_\varepsilon \), we obtain the inequality \(f_\varepsilon (x) \ge 0\) whenever \(|x|\ge r(f) + \varepsilon \). Similarly \(\widehat{f_\varepsilon } = (\widehat{f} \ \widehat{\varphi }_\varepsilon )*\varphi _\varepsilon \), which implies that \(\widehat{f}_\varepsilon (x)\ge 0\) whenever \(|x|\ge r(\widehat{f}\,)+\varepsilon \). Furthermore, \(f_\varepsilon \) is a Schwartz function. To see why, note that \(\widehat{\varphi }_\varepsilon \) is a Schwartz function, while \(f * \varphi _\varepsilon \) is smooth and all its derivatives are bounded.

Now that we have Schwartz functions approximating f, we again apply Lemma 2.1 to obtain

$$\begin{aligned} f_\varepsilon (0) + \widehat{f}_\varepsilon (0) = \sum _{j\ge 1} c_j{\!} f_\varepsilon (\sqrt{2j}) + \sum _{j\ge 1} c_j{\!} \widehat{f}_\varepsilon (\sqrt{2j}). \end{aligned}$$

To derive information from this identity, we combine the limits \(f_\varepsilon (\sqrt{2j}) \rightarrow f(\sqrt{2j})\) and \(\widehat{f}_\varepsilon (\sqrt{2j}) \rightarrow \widehat{f}(\sqrt{2j})\) for \(j \ge 0\), the inequalities \(f(0) \le 0\), \(\widehat{f}(0) \le 0\), \(f(\sqrt{2}) \ge 0\), and \(\widehat{f}(\sqrt{2}) \ge 0\), and the inequalities \(f_\varepsilon (\sqrt{2j}) \ge 0\) and \(\widehat{f}_\varepsilon (\sqrt{2j}) \ge 0\) for \(j \ge 2\) (when \(\varepsilon < 2-\sqrt{2}\)). We conclude that \(f(\sqrt{2j}) = \widehat{f}(\sqrt{2j}) = 0\) for \(j \ge 0\), as desired. \(\square \)

We will now apply this lemma to prove the lower bound \(\mathrm {A}_+(12) \ge \sqrt{2}\).

Lemma 2.3

Suppose \(f \in \mathcal {A}_+(12)\). If \(r(f)r(\widehat{f}\,) < 2\), then f vanishes identically.

Proof

By rescaling the input to f, we can assume without loss of generality that r(f) and \(r(\widehat{f}\,)\) are both less than \(\sqrt{2}\). Now we apply Lemma 2.2 to a rescaled version of f. Choose \(\lambda >0\) and let \(g(x) = f(\lambda x)\). Then \(\widehat{g}(\xi ) = \lambda ^{-12} \widehat{f}(\xi /\lambda )\), and it follows that \(g \in \mathcal {A}_+(12)\). Moreover, if \(\lambda \) is close enough to 1, then r(g) and \(r(\widehat{g})\) are both less than \(\sqrt{2}\).

By Lemma 2.2, if \(\lambda \) is sufficiently close to 1, then \(g(x) = 0\) whenever \(|x|=\sqrt{2j}\) with \(j \ge 1\). Thus there exists some \(\lambda _0>1\) such that \(f(x)=0\) whenever \(|x|\in (\sqrt{2j}/\lambda _0,\sqrt{2j}\lambda _0)\) and \(j\ge 1\), and the same holds for \(\widehat{f}\). The union of these intervals covers the entire half-line \([R,\infty )\) for some \(R>0\), because

$$\begin{aligned} \lim _{j \rightarrow \infty } \frac{\sqrt{2j+2}}{\sqrt{2j}} = 1. \end{aligned}$$

In other words, f and \(\widehat{f}\) both have compact support, which implies that \(f=0\). \(\square \)

Exactly the same technique applies to any dimension and sign:

Proposition 2.4

Let \(s \in \{\pm 1\}\), \(0< \rho _0< \rho _1 < \cdots \) with

$$\begin{aligned} \lim _{j \rightarrow \infty } \frac{\rho _{j+1}}{\rho _j} = 1, \end{aligned}$$

and \(c_j{\!} > 0\) for \(j \ge 0\). If every radial Schwartz function \(f :\mathbb {R}^d \rightarrow \mathbb {R}\) satisfies the summation formula

$$\begin{aligned} f(0) + s\widehat{f}(0) = s\sum _{j \ge 0} c_j{\!} f(\rho _j) + \sum _{j \ge 0} c_j{\!} \widehat{f}(\rho _j), \end{aligned}$$
(2.3)

then \(\mathrm {A}_s(d) \ge \rho _0\).

For example, for \(k\ge 2\), the summation formula coming from the Eisenstein series \(E_{2k}\) proves that \(\mathrm {A}_{(-1)^{k-1}}(4k) \ge \sqrt{2}\). This lower bound is sharp for \(k=2\) and \(k=3\), but it is not even true for \(k=1\), because \(E_2\) is merely a quasimodular form.

The summation formula (2.3) automatically holds when \(\widehat{f} = -\, s f\). Thus, it is equivalent to the assertion that

$$\begin{aligned} f(0) = s \sum _{j \ge 0} c_j{\!} f(\rho _j) \end{aligned}$$
(2.4)

holds whenever \(\widehat{f} = s f\).

Conjecture 2.5

For each \(s = \pm 1\) and \(d \ge 1\) except perhaps \((s,d)=(1,1)\), there is a summation formula that proves a sharp lower bound for \(\mathrm {A}_s(d)\) via Proposition 2.4.

In the case \(s=-\,1\), this conjecture is analogous to [3, Conjecture 4.2]. It holds in every case in which \(\mathrm {A}_s(d)\) is known exactly: the summation formulas that establish sharp lower bounds for \(\mathrm {A}_{-}(1)\), \(\mathrm {A}_{-}(8)\), and \(\mathrm {A}_{-}(24)\) are Poisson summation over the \(\mathbb {Z}\), \(E_8\), and Leech lattices, respectively, while the \(\mathrm {A}_{+}(12)\) case is Lemma 2.1. The conjectured value of \(\mathrm {A}_{-}(2)\) corresponds to Poisson summation over the isodual scaling of the \(A_2\) lattice. Conjecture 2.5 is not known to hold in any other case, nor can we guess what the summation formula should be, but the numerical and theoretical evidence in favor of this conjecture is compelling (see Sects. 4 and 5). In particular, in most cases we can compute the constants \(c_j{\!}\) and \(\rho _j\) in these conjectural summation formulas to high precision.

The coefficients \(c_j{\!}\) are integers in the five exact cases listed above, but integral coefficients seem to be rare, and it is plausible that no more such cases exist. One interesting example is the (conjectural) summation formula that yields \(\mathrm {A}_{+}(28)\). It is natural to guess that \(\mathrm {A}_{+}(28)=2\), in accordance with \(\mathrm {A}_{+}(12)=\mathrm {A}_{-}(8)\) and \(\mathrm {A}_{-}(24)=2\), but in fact \(\mathrm {A}_{+}(28) < 1.98540693489105\), and we conjecture that \(\mathrm {A}_{+}(28) = 1.985406934891049\ldots .\) (See Sect. 4 for a discussion of our numerical methods.) In Table 1, we approximate a conjectural summation formula that would establish this equality, which we computed using the techniques of Sect. 5. We are unable to describe the numbers \(\rho _j\) and \(c_j{\!}\) in the summation formula exactly, but we believe that \(\rho _j = \sqrt{2j+4+o(1)}\) as \(j \rightarrow \infty \) (see Conjecture 4.2) and \(c_j{\!} = (24+o(1)) \sigma _{13}(j+2)\). The latter equation says that \(-c_j{\!}\) is asymptotic to the coefficient of \(e^{(2j+4)\pi i z}\) in the Fourier expansion

$$\begin{aligned} E_{14}(z) = 1&- 24e^{2\pi i z} - 196632e^{4\pi i z} - 38263776e^{6\pi i z} - 1610809368e^{8\pi i z}\\&\phantom {} - 29296875024e^{10\pi i z} - 313495116768e^{12\pi i z}\\&\phantom {} - 2325336249792e^{14\pi i z} - 13195750342680e^{16\pi i z} - \cdots \end{aligned}$$

of the Eisenstein series \(E_{14}\), and indeed these coefficients are close to those in the table. Note that the difference between the role of \(E_{14}\) here and that of \(E_6\) when \(d=12\) is that the summation formula for \(d=28\) suppresses the \(-24 e^{2\pi i z}\) term in \(E_{14}(z)\) at the cost of perturbing all the remaining numbers.

Table 1 Summation formula that would prove \(\mathrm {A}_{+}(28) \ge 1.985406934891049\ldots \)

2.2 Theta series and an extremal function in dimension 12

To prove the upper bound \(\mathrm {A}_+(12) \le \sqrt{2}\), we will construct an explicit function \(f \in \mathcal {A}_+(12)\) satisfying \(\widehat{f} = f\), \(f(0)=0\), and \(r(f) = \sqrt{2}\). To do so, we will use a remarkable integral transform discovered by Viazovska that turns modular forms into radial eigenfunctions of the Fourier transform. See [19] for background on modular forms, and [8, 9, 18] for other applications of this transform.

Viazovska’s method can be summarized by the following proposition, which is implicit in [18] but was stated there only for a specific modular form with \(d=8\) (and similarly for \(d=24\) in [8]). We omit the proof, because it closely follows the same approach as [18, Propositions 5 and 6] and [8, Lemma 3.1]. All that needs to be checked is the dependence on the dimension d.

Proposition 2.6

Let d be a positive multiple of 4, and let \(\psi \) be a weakly holomorphic modular form of weight \(2-d/2\) for \(\Gamma (2)\) such that

$$\begin{aligned} z^{d/2-2}\psi (-1/z) + \psi (z+1) = \psi (z) \end{aligned}$$

for all z in the upper half-plane, \(t^{d/2-2}\psi (i/t) \rightarrow 0\) as \(t \rightarrow \infty \), and \(|\psi (it)| = O\big (e^{K\pi t}\big )\) as \(t \rightarrow \infty \) for some constant K. Define a radial function \(f :\mathbb {R}^d \rightarrow \mathbb {R}\) by

$$\begin{aligned} \begin{aligned} f(x)&= \frac{i}{4}\int _{-1}^i \psi (z+1) e^{\pi i |x|^2 z} \, dz + \frac{i}{4}\int _{1}^i \psi (z-1) e^{\pi i |x|^2 z} \, dz\\&\quad \phantom {} - \,\frac{i}{2}\int _{0}^i \psi (z) e^{\pi i |x|^2 z} \, dz - \frac{i}{2} \int _i^{i\infty } z^{d/2-2}\psi (-1/z) e^{\pi i |x|^2 z} \, dz. \end{aligned} \end{aligned}$$
(2.5)

Then f is a Schwartz function and an eigenfunction of the Fourier transform with eigenvalue \((-1)^{1+d/4}\). Furthermore,

$$\begin{aligned} f(x) = \sin \mathopen {}\big (\pi |x|^2/2\big )^2\mathclose {} \int _0^\infty \psi (it) e^{-\pi |x|^2 t}\, dt \end{aligned}$$

whenever \(|x|^2 > K\).

Viazovska in fact developed two such techniques, one for each eigenvalue, and both are used in the sphere packing papers [8, 18]. We will not need the other technique, which yields eigenvalue \((-1)^{d/4}\) instead of \((-1)^{1+d/4}\) and uses a weakly holomorphic quasimodular form of weight \(4-d/2\) and depth 2 for \(\mathrm {SL}_2(\mathbb {Z})\).

When applying Proposition 2.6, we will use the notation

figure a
figure b

for theta functions from [8, 18]. Their fourth powers \(\Theta _{00}^4\), \(\Theta _{01}^4\), and \(\Theta _{10}^4\) are modular forms of weight 2 for \(\Gamma (2)\), which satisfy the Jacobi identity \(\Theta _{00}^4 = \Theta _{01}^4 + \Theta _{10}^4\) and the transformation laws

figure c

under the action of \(\mathrm {SL}_2(\mathbb {Z})\). We will also use the modular form \(\Delta \), defined by

$$\begin{aligned} \Delta (z) = e^{2\pi i z} \prod _{n=1}^\infty (1-e^{2\pi i n z})^{24}. \end{aligned}$$

It is a modular form of weight 12 for the group \(\mathrm {SL}_2(\mathbb {Z})\), which contains \(\Gamma (2)\); thus \(\Delta (z+1) = \Delta (z)\) and \(z^{-12} \Delta (-1/z) = \Delta (z)\).

Using these ingredients, we will now construct a suitable modular form for use in Proposition 2.6, to prove Theorem 1.2. Let

$$\begin{aligned} \psi = \frac{\big (\Theta _{00}^4 + \Theta _{10}^4\big )\Theta _{01}^{12}}{\Delta }. \end{aligned}$$
(2.6)

(We discuss the motivation for this definition at the end of this section.) Then \(\psi \) is a weakly holomorphic modular form of weight \(4\cdot 2 - 12 =-\,4\), and the identity

$$\begin{aligned} z^{4}\psi (-1/z) + \psi (z+1) = \psi (z) \end{aligned}$$

can be checked using the formulas listed above. (Note that \(\psi \) is weakly holomorphic because the product formula shows that \(\Delta \) does not vanish in the upper half-plane.)

Using the definitions for \(\Theta _{00}\), \(\Theta _{01}\), \(\Theta _{10}\), and \(\Delta \) given above, we can compute the Fourier series

$$\begin{aligned} \psi (z) = e^{-2\pi i z} - 264 + 4096 e^{\pi i z} - 36828 e^{2\pi i z} + 245760 e^{3\pi i z} + \cdots . \end{aligned}$$
(2.7)

This series is absolutely convergent in the upper half-plane, and thus \(|\psi (it)| = O\big (e^{2\pi t}\big )\) as \(t \rightarrow \infty \). Using the transformation laws again, we find that

$$\begin{aligned} z^4 \psi (-1/z)&= \frac{\big (\Theta _{00}(z)^4 + \Theta _{01}(z)^4\big )\Theta _{10}(z)^{12}}{\Delta (z)}\\&= 8192 e^{\pi i z} + 491520 e^{3\pi i z} + 12828672 e^{5 \pi i z} + \cdots . \end{aligned}$$

In particular, \(|t^4 \psi (i/t)| = O\big (e^{-\pi t}\big )\) as \(t \rightarrow \infty \).

Thus, \(\psi \) satisfies the hypotheses of Proposition 2.6 with \(d=12\) and \(K=2\). Define \(f :\mathbb {R}^{12} \rightarrow \mathbb {R}\) by (2.5). Then f is a radial Schwartz function satisfying \(\widehat{f} = f\) and

$$\begin{aligned} f(x) = \sin \mathopen {}\big (\pi |x|^2/2\big )^2\mathclose {} \int _0^\infty \psi (it) e^{-\pi |x|^2 t}\, dt \end{aligned}$$
(2.8)

for \(|x| > \sqrt{2}\).

It follows from (2.6) that

$$\begin{aligned} \psi (it) > 0 \end{aligned}$$
(2.9)

for all \(t>0\), because \(\Theta _{00}(it)\), \(\Theta _{01}(it)\), and \(\Theta _{10}(it)\) are all real, while \(0< \Delta (it) < 1\). Thus, (2.8) implies that \(f(x) \ge 0\) for \(|x| > \sqrt{2}\), with double roots at \(|x| = \sqrt{2j}\) for integers \(j\ge 2\) and no other roots in this range.

For comparison, the quasimodular form inequalities that play the same role as (2.9) in [18] and [8] are obtained via computer-assisted proofs. The reason for this discrepancy is that those proofs combine \(+\,1\) and \(-\,1\) eigenfunctions, which introduces technical difficulties. If all one wishes to prove is that \(\mathrm {A}_-(8) = \sqrt{2}\) and \(\mathrm {A}_-(24)=2\), then one can avoid computer assistance. Specifically, the formula (3.1) in [8] is visibly positive in the same sense as our formula (2.6), and while that is not true for formula (46) in [18], it can be rewritten so as to be visibly positive (see, for example, the corresponding formula in [4]).

To analyze the behavior of f(x) with \(0 \le |x| \le \sqrt{2}\), we can simply cancel the growth of \(\psi (it)\). The series (2.7) shows that

$$\begin{aligned} \psi (it) = e^{2\pi t} - 264 + O\big (e^{-\pi t}\big ) \end{aligned}$$

as \(t \rightarrow \infty \). For \(|x| > \sqrt{2}\), we obtain the new formula

$$\begin{aligned} f(x) = \sin \mathopen {}\big (\pi |x|^2/2\big )^2\mathclose {} \left( \frac{528-263|x|^2}{\pi |x|^2 (|x|^2-2)} + \int _0^\infty \big (\psi (it)-e^{2\pi t} + 264\big ) e^{-\pi |x|^2 t}\, dt\right) \end{aligned}$$

from (2.8), and the integral in this formula now converges for all x. It follows from (2.5) that f(x) is a holomorphic function of |x|; thus, the new formula must agree with the old one for all x by analytic continuation.

The term

$$\begin{aligned} \sin \mathopen {}\big (\pi |x|^2/2\big )^2\mathclose {} \int _0^\infty \big (\psi (it)-e^{2\pi t} + 264\big ) e^{-\pi |x|^2 t}\, dt \end{aligned}$$

vanishes to second order at \(|x|=\sqrt{2j}\) for all \(j \ge 1\), and to fourth order at the origin. Thus, f(x) must agree with

$$\begin{aligned} \sin \mathopen {}\big (\pi |x|^2/2\big )^2\mathclose {} \left( \frac{528-263|x|^2}{\pi |x|^2 (|x|^2-2)} \right) \end{aligned}$$

to second order at \(|x|=\sqrt{2}\) and to fourth order at the origin, and so f(x) has a single root at \(|x|=\sqrt{2}\) and a double root at the origin. More specifically,

$$\begin{aligned} f(x) = \frac{\pi }{\sqrt{2}}(|x|-\sqrt{2}) + O\big ((|x|-\sqrt{2})^2\big ) \end{aligned}$$

as \(|x| \rightarrow \sqrt{2}\), and

$$\begin{aligned} f(x) = -\,66\pi |x|^2 + O\big (|x|^4\big ) \end{aligned}$$

as \(x \rightarrow 0\).

In particular, \(f(0)=0\). It follows that \(f \in \mathcal {A}_+(12)\), and therefore \(\mathrm {A}_+(12) \le \sqrt{2}\), as desired. We have now proved all of the assertions from Theorem 1.2.

As the quadratic term \(-\,66\pi |x|^2\) suggests, our construction of f is scaled so that its values are rather large. For example, its minimum value appears to be \(f(x) \approx -\,23.8088\), achieved when \(|x| \approx 0.557391\). In Fig. 1, we have plotted a more moderate scaling of this function.

To arrive at the definition (2.6) of \(\psi \), we began with the Ansatz that \(\psi \Delta \) should be a holomorphic modular form of weight 8 for \(\Gamma (2)\). Equivalently, it should be a linear combination of \(\Theta _{00}^{16}\), \(\Theta _{00}^{12} \Theta _{01}^4\), \(\Theta _{00}^8 \Theta _{01}^8\), \(\Theta _{00}^4 \Theta _{01}^{12}\), and \(\Theta _{01}^{16}\). Imposing the constraint \(z^{4}\psi (-1/z) + \psi (z+1) = \psi (z)\) eliminates three degrees of freedom, which leaves just one degree of freedom, up to scaling. The remaining constraint is that the coefficient of \(e^{-\pi i z}\) in the Fourier expansion of \(\psi (z)\) must vanish, and then \(\psi \) is determined modulo scaling. Finally, we rewrote the formula for \(\psi \) to make it visibly positive.

3 The \(-\,1\) eigenfunction uncertainty principle

This section is devoted to the proof of Theorem 1.4. We deal only with the \(-1\) case, because all the assertions in this theorem were already proved in [12] for the \(+1\) case. First, we reduce determining \(\mathrm {A}_-(d)\) to solving Problem 1.3.

Lemma 3.1

For each \(f\in \mathcal {A}_-(d){\setminus }\{0\}\), there exists a radial function \(g\in \mathcal {A}_-(d){\setminus }\{0\}\) such that \(\widehat{g} =-g\), \(g(0)=0\), and \(r(g) \le \sqrt{r(f)r(\widehat{f}\,)^{\phantom {\frac{0}{.}}}}\).

Proof

If f is not radial, then we average its rotations about the origin to obtain a radial function without increasing r(f) or \(r(\widehat{f}\,)\). Thus, we can assume that f is radial. Note that this process cannot lead to the zero function: if it did, then f and \(\widehat{f}\) would both have compact support and hence vanish identically.

The quantity \(r(f)r(\widehat{f}\,)\) is unchanged if we replace f with \(x \mapsto f(\lambda x)\) for some \(\lambda >0\). Thus, we can assume that \(r(f)=r(\widehat{f}\,)\). Letting \(g=f-\widehat{f}\) we deduce that \(g\in \mathcal {A}_-(d)\), \(\widehat{g}=-g\), and \(r(g)\le r(f)\). Again, g cannot vanish identically, because f and \(-\widehat{f}\) are eventually nonnegative and would thus both have to have compact support.

It remains to force \(g(0)=0\), since a priori we can have \(g(0)>0\). For \(t>0\), consider the auxiliary function

$$\begin{aligned} \varphi _t(x) = \frac{e^{-t\pi |x|^2} - e^{-2t\pi |x|^2}}{t^{-d/2} - (2t)^{-d/2}}. \end{aligned}$$
(3.1)

Then \(\varphi _t\ge 0\), \(\varphi _t(0)=0\), \(\widehat{\varphi }_t(0)=1\), and \(\widehat{\varphi }_t(x) <0\) if \(|x|^2\ge td\log (2)/\pi \). Choosing \(t>0\) so that \(\sqrt{td\log (2)/\pi } = r(g)\), we deduce that the function \(h = g + g(0)(\varphi _t -\widehat{\varphi }_t)\) belongs to \(\mathcal {A}_-(d)\), \(\widehat{h} = -h\), \(h(0)=0\), and \(r(h)\le r(g)\). Finally, if \(g(0)>0\), then \(h(x) > g(x)\) for all sufficiently large x, and thus h is not the zero function. \(\square \)

3.1 Lower and upper bounds

To obtain a lower bound for \(\mathrm {A}_-(d)\), we follow [2, 12]. Let \(g\in \mathcal {A}_-(d){\setminus }\{0\}\) be a radial function satisfying \(\widehat{g}=-g\) and \(g(0)=0\), and assume without loss of generality that \(\Vert g\Vert _1 = 1\).

Let \(g^+ = \max \{g,0\}\) and \(g^- = \max \{-g,0\}\), so that \(g^+,g^- \ge 0\), these functions are never positive at the same point, and \(g=g^+-g^-\). Since \(\widehat{g}(0)=0\),

$$\begin{aligned} \int _{\mathbb {R}^d} g^+ = \int _{\mathbb {R}^d} g^-. \end{aligned}$$

Furthermore,

$$\begin{aligned} \int _{\mathbb {R}^d} g^- = \int _{B^d_{r(g)}} g^-, \end{aligned}$$

where \(B^d_{r(g)}\) is a d-dimensional ball of radius r(g) and centered at the origin, because \(\{x\in \mathbb {R}^d:g(x)< 0\}\subseteq B^d_{r(g)}\). It follows that

$$\begin{aligned} \int _{B^d_{r(g)}} g^- = 1/2, \end{aligned}$$

because \(\Vert g\Vert _{1}=1\). Thus,

$$\begin{aligned} 1/2&\le {{\,\mathrm{\mathrm {vol}}\,}}\mathopen {}\big (B^d_1\big )\mathclose {}r(g)^d \Vert g\Vert _{\infty }\\&\le {{\,\mathrm{\mathrm {vol}}\,}}\mathopen {}\big (B^d_1\big )\mathclose {}r(g)^d \Vert \widehat{g}\Vert _{1}\\&= {{\,\mathrm{\mathrm {vol}}\,}}\mathopen {}\big (B^d_1\big )\mathclose {}r(g)^d \Vert g\Vert _{1} \\&= {{\,\mathrm{\mathrm {vol}}\,}}\mathopen {}\big (B^d_1\big )\mathclose {}r(g)^d, \end{aligned}$$

and we conclude that

$$\begin{aligned} \mathrm {A}_-(d) \ge \left( \frac{1}{2 {{\,\mathrm{\mathrm {vol}}\,}}\mathopen {}\big (B^d_1\big )\mathclose {}}\right) ^{1/d} = \frac{\Gamma (d/2+1)^{1/d}}{2^{1/d}\sqrt{\pi }} > \sqrt{\frac{d}{2\pi e}}. \end{aligned}$$
(3.2)

Next we prove an upper bound for \(\mathrm {A}_-(d)\). Let

$$\begin{aligned} L_n^\nu (z) = \sum _{j=0}^n \left( {\begin{array}{c}n+\nu \\ n-j\end{array}}\right) \frac{(-z)^j}{j!} \end{aligned}$$

be the generalized Laguerre polynomial of degree n with parameter \(\nu >-1\). When \(\nu = d/2-1\), the functions \(\psi _n^\nu :\mathbb {R}^d \rightarrow \mathbb {R}\) defined by

$$\begin{aligned} \psi _n^\nu (x) = L_n^\nu (2\pi |x|^2)e^{-\pi |x|^2} \end{aligned}$$
(3.3)

form a orthogonal basis for the space of radial functions in \(L^2(\mathbb {R}^d)\), and they are eigenfunctions for the Fourier transform:

$$\begin{aligned} \widehat{\psi }_n = (-1)^n\psi _n. \end{aligned}$$

(See, for example, Lemma 10 in [12].)

Let

$$\begin{aligned} p(z)&= L^\nu _1(z)L^\nu _3(0) - L^\nu _3(z)L^\nu _1(0)\\&= \frac{(1+\nu )}{6}z\left( {2(3+\nu )(2+\nu )} - 3(3+\nu )z + {z^2}\right) . \end{aligned}$$

The roots of this polynomial are 0 and

$$\begin{aligned} \frac{3\nu +9\pm \sqrt{33+14\nu +\nu ^2}}{2}, \end{aligned}$$

and it is positive beyond the largest of these roots. If \(\nu =d/2-1\), then the largest root takes the form

$$\begin{aligned} \frac{3d/2+6 + \sqrt{20+6d+d^2/4}}{2}. \end{aligned}$$

Now the function \(g :\mathbb {R}^d \rightarrow \mathbb {R}\) defined by

$$\begin{aligned} g(x)&= \psi _1^\nu (x)\psi _3^\nu (0) - \psi _3^\nu (x)\psi _1^\nu (0)\\&= p(2\pi |x|^2) e^{-\pi |x|^2} \end{aligned}$$

is radial, belongs to \(\mathcal {A}_-(d)\), and satisfies \(\widehat{g} = - g\) and \(g(0)=0\). Hence

$$\begin{aligned} \mathrm {A}_-(d) \le \sqrt{\frac{3d/2+6 + \sqrt{20+6d+d^2/4}}{4\pi }} = \big (1+O(d^{-1/2})\big )\sqrt{\frac{d}{2\pi }}. \end{aligned}$$
(3.4)

Estimates (3.2) and (3.4) imply that \(\mathrm {A}_-(d)/\sqrt{d}\) is bounded above and below by positive constants, as desired. In particular, the lower bound is \(1/\sqrt{2\pi e}\), and the upper bound is at most 1 except for \(d=1\), in which case we can use \(\mathrm {A}_-(1)=1\) to obtain an upper bound of 1.

We believe that the upper bound (3.4) cannot be improved if we replace p with any polynomial of bounded degree, in the following sense. For \(N \ge 3\) and \(s = \pm 1\), let \(\mathrm {A}_{s,N}(d)\) be the infimum of r(g) over all nonzero \(g :\mathbb {R}^d \rightarrow \mathbb {R}\) such that \(\widehat{g} = sg\), \(g(0)=0\), and g is of the form

$$\begin{aligned} g(x) = p(2\pi |x|^2) e^{-\pi |x|^2}, \end{aligned}$$

where p is a polynomial of degree at most N. (The restriction to \(N \ge 3\) ensures that such a function exists.)

Conjecture 3.2

For fixed \(N \ge 3\) and \(s = \pm 1\),

$$\begin{aligned} \lim _{d \rightarrow \infty } \frac{\mathrm {A}_{s,N}(d)}{\sqrt{d}} = \frac{1}{\sqrt{2\pi }}. \end{aligned}$$

However, the upper bound for \(\mathrm {A}_-(d)\) can be improved using other functions. In particular, we can make use of the auxiliary functions f constructed in [11] for the linear programming bound in high dimensions. If we set \(g = \widehat{f} - f\), then one can show that

$$\begin{aligned} r(g) \le (0.3194\ldots + o(1)) \sqrt{d} \end{aligned}$$

as \(d \rightarrow \infty \). The number \(0.3194\ldots \) is derived from the Kabatiansky–Levenshtein bound for sphere packing, and the construction in [11] shows how to obtain that bound via the linear programming bound. The precise number is rather complicated, but it can be characterized as follows. Let \(\theta = 1.0995\ldots \) be the unique root of

$$\begin{aligned} 2\log (\sec (\theta ) + \tan (\theta )) = \sin (\theta ) + \tan (\theta ) \end{aligned}$$

in the interval \((0,\pi /2)\), and let

$$\begin{aligned} c = \frac{\sin (\theta /2) \cot (\theta ) e^{\sec (\theta )/2}}{\sqrt{2\pi }} = 0.3194\ldots . \end{aligned}$$

Then

$$\begin{aligned} r(g) \le (c+o(1)) \sqrt{d} \end{aligned}$$

as \(d \rightarrow \infty \), and hence

$$\begin{aligned} \limsup _{d \rightarrow \infty } \frac{\mathrm {A}_-(d)}{\sqrt{d}}\le c. \end{aligned}$$

We do not know how to prove the corresponding bound for \(\mathrm {A}_+(d)\), although we believe it should be true, as it would follow from Conjecture 1.5.

3.2 Existence of extremizers

The existence proof for extremizers with \(s=-1\) is almost identical to the proof of the \(+1\) case in [12, Section 6]. We briefly outline the proof here for completeness. Let \(f_n \in \mathcal {A}_-(d){\setminus }\{0\}\) be an extremizing sequence; that is, \(\sqrt{r(f_n)r(\widehat{f}_n)^{\phantom {\frac{0}{.}}}} \searrow \mathrm {A}_-(d)\) as \(n\rightarrow \infty \). By Lemma 3.1 we can assume that \(\widehat{f}_n = -f_n\) and \(f_n(0)=0\), and hence \(r(f_n)\searrow \mathrm {A}_-(d)\). We can also assume that \(\Vert f_n\Vert _{1} = 1\) for all n. In particular, since \(\widehat{f}_n=-f_n\), we have

$$\begin{aligned} \Vert f_n\Vert _{2}^2 = \int _{\mathbb {R}^d} |f_n|^2 \le \Vert f_n\Vert _\infty \cdot \Vert f_n\Vert _1 \le \Vert \widehat{f}_n\Vert _1 \cdot \Vert f_n\Vert _{1} = 1. \end{aligned}$$

Because the unit ball in \(L^2(\mathbb {R}^d)\) is weakly compact, we can assume that \(f_n\) converges weakly to some function \(f\in L^2(\mathbb {R}^d)\). Because \({\mathcal {A}}_-(d)\) is convex, we can apply Mazur’s lemma to assume furthermore that \(f_n\) converges almost everywhere and in \(L^2(\mathbb {R}^d)\) to f. Thus, necessarily we have \(\widehat{f}=-f\) and \(r(f)\le \mathrm {A}_-(d)\). Since \(\Vert f_n\Vert _{\infty }\le \Vert \widehat{f}_n\Vert _{1} = \Vert f_n\Vert _{1}= 1\) and \(r(f_n)\) is decreasing, we can apply Fatou’s lemma for \(g_n=\mathbf {1}_{B^d_{r(f_1)}} + f_n\ge 0\) to deduce that \(f\in L^1(\mathbb {R}^d)\) and \(\widehat{f}(0)\le 0\). Hence, \(f(0)\ge 0\). We now use Jaming’s high-dimensional version [14] of Nazarov’s uncertainty principle [15] to deduce, exactly as in [12, Lemma 23], that there exists \(K<0\) such that for all n,

$$\begin{aligned} \int _{B^d_{r(f_n)}} f_n \le K. \end{aligned}$$

(Alternatively, we can use Proposition 2.6 from [1], which tells us less about the constant K but has a simpler proof.) Fatou’s lemma implies that f satisfies the same estimate, and hence is not identically zero. We conclude that \(f \in \mathcal {A}_-(d)\), \(\widehat{f} = -f\), and \(r(f)\le \mathrm {A}_-(d)\), and thus \(r(f) = \mathrm {A}_-(d)\). Finally, we must have \(f(0)=0\), since otherwise the proof of Lemma 3.1 would produce a better function.

3.3 Infinitely many roots

All that remains to prove is that the extremizers have infinitely many roots. The proof follows the ideas of [12, Section 6.2] for the \(+1\) case. If \(f\in \mathcal {A}_-(d)\) satisfies \(\widehat{f}=-f\) and \(f(0)=0\) and vanishes at only finitely many radii beyond r(f), then we find a perturbation function \(g\in \mathcal {A}_-(d)\) satisfying \(\widehat{g} = -g\) and \(g(0)=0\) such that \(r(f+\varepsilon g) < r(f)\) for small \(\varepsilon >0\); thus, f cannot be extremal. In [12], the construction of g varies between the cases \(d=1\) (using the Poincaré recurrence theorem) and \(d\ge 2\) (using a trick involving Laguerre polynomials). However, thanks to the Poisson summation formula, every extremal function \(f\in \mathcal {A}_-(1)\) with \(\widehat{f}=-f\) and \(f(0)=0\) must vanish at the integers. Thus, we only need to prove our assertion for \(d\ge 2\).

In fact, we will rule out the possibility that an extremizer f is eventually positive. Then applying this proof to the radialization of f will show that f must vanish on spheres of arbitrarily large radius. Thus, let \(f \in \mathcal {A}_-(d)\) be such that \(\widehat{f} = -f\), \(f(0)=0\), and \(f(x)>0\) for \(|x| \ge R\). We must show that \(r(f) > \mathrm {A}_-(d)\).

Let \(\varphi _t\) be the function defined in (3.1) with \(t \in (0,1)\) chosen so that

$$\begin{aligned} \sqrt{td\log (2)/\pi }<r(f), \end{aligned}$$

and let \(\psi = \varphi _t - \widehat{\varphi }_t\). Then \(\widehat{\psi }=-\, \psi \), \(\psi (0)=-1\), and \(\psi (x) > 0\) for \(|x|\ge r(f)\). This function almost works as a possible perturbation g, but it needs to be fixed at the origin without changing its eventual nonnegativity. To do so, let \(\nu =d/2-1\) and consider the function

$$\begin{aligned} g_n = \psi + \frac{\psi ^\nu _{2n+1}}{\psi ^\nu _{2n+1}(0)}, \end{aligned}$$

where \(\psi ^\nu _{2n+1}\) is the eigenfunction defined in (3.3). Now \(\widehat{g}_n = -g_n\), \(g_n(0)=0\), and \(g_n\) is eventually positive for each \(n\ge 0\), because \(t<1\) implies that \(\psi ^\nu _{2n+1}\) decays faster than \(\psi \).

As observed in [12], for \(d \ge 2\) the eigenfunctions \(\psi ^\nu _j/\psi ^\nu _j(0)\) converge to zero uniformly on all compact subsets of \(\mathbb {R}^d{\setminus }\{0\}\) as \(j \rightarrow \infty \); the proof amounts to Fejér’s asymptotic formula for Laguerre polynomials [17, Theorem 8.22.1]. Using this convergence, let n be large enough that \(g_n(x) > 0\) for \(|x| \in [r(f),R]\), and then choose \(R'\) so that \(g_n(x) > 0\) for \(|x| \ge R'\). Let \(m=\min \{|f(x)| : R\le |x| \le R'\}\), \(M=\max \{|g_{n}(x)|: x\in \mathbb {R}^d\}\), and \(0<\varepsilon <m/M\). Then the perturbation \(f_\varepsilon = f + \varepsilon g_n\) satisfies \(f_\varepsilon (x) > 0\) for \(|x| \ge r(f)\). Thus, \(r(f_\varepsilon ) < r(f)\), which means f cannot be extremal. This completes the proof of Theorem 1.4.

4 Numerical evidence

To explore how \(\mathrm {A}_+(d)\) behaves, we numerically optimized functions \(g :\mathbb {R}^d \rightarrow \mathbb {R}\) satisfying the conditions of Problem 1.1. Readers who wish to examine this data can obtain our numerical results from [6].

In our calculations we always choose g to be of the form \(g(x) = p(2\pi |x|^2) e^{-\pi |x|^2}\), where p is a polynomial in one variable of degree at most \(4k+2\), which means p has \(4k+2\) degrees of freedom modulo scaling. The constraint \(g(0)=0\) eliminates one degree of freedom, and one can check using the Laguerre eigenbasis that the constraint \(\widehat{g} = g\) eliminates \(2k+1\) degrees of freedom. To control the remaining 2k degrees of freedom, we specify k double roots at radii \(\rho _1< \dots < \rho _k\). We then attempt to choose the radii \(\rho _1,\dots ,\rho _k\) so as to minimize r(g). To do so, we iteratively optimize the choice of radii for successive values of k, by making an initial guess based on the previous value of k and then improving the guess using multivariate Newton’s method. Each choice of \(\rho _1,\dots ,\rho _k\) proves an upper bound for \(\mathrm {A}_+(d)\), and we hope to approximate \(\mathrm {A}_+(d)\) closely as k grows. (Note that if Conjecture 3.2 holds, then we cannot obtain improved bounds if k remains bounded for large d.) This method was first applied by Cohn and Elkies [5, Section 7] to \(\mathrm {A}_-(d)\), with a simpler optimization algorithm. Cohn and Kumar [7] replaced that algorithm with Newton’s method, and we made use of their implementation.

We have no guarantee that the numerical optimization will converge to even a local optimum for any given d and k, or that the resulting bounds will converge to \(\mathrm {A}_+(d)\) as \(k \rightarrow \infty \). Indeed, we quickly ran into problems when \(d \le 2\), and eventually for \(d=3\) and 4 as well, but for \(5 \le d \le 128\) we arrived at the global optimum for each \(k \le 64\). These calculations are what initially led us to believe that \(\mathrm {A}_+(12)=\sqrt{2}\).

Our numerical calculations are generally not rigorous: although we believe we have used more than sufficient precision, we cannot bound the error from the use of floating-point arithmetic. However, we have used exact rational arithmetic to prove all the numerical upper bounds for \(\mathrm {A}_s(d)\) we report in this paper.Footnote 1 Thus, they are genuine theorems, while our numerical assertions about summation formulas have not been rigorously proved.

Table 2 Upper bounds for \(\mathrm {A}_+(d)\) and \(\mathrm {A}_-(d-4)\)

Table 2 shows our upper bounds for \(\mathrm {A}_+(d)\) for \(1 \le d \le 32\), together with \(\mathrm {A}_-(d-4)\) for comparison (taken from [4]). The shift by 4 approximately aligns the columns, with the best case being \(\mathrm {A}_+(12) = \mathrm {A}_-(8) = \sqrt{2}\). We have no conceptual explanation for this alignment, but it fits conveniently with the sign in Proposition 2.6, and it supports our conjecture that

$$\begin{aligned} \lim _{d \rightarrow \infty } \frac{\mathrm {A}_+(d)}{\sqrt{d}} = \lim _{d \rightarrow \infty } \frac{\mathrm {A}_-(d)}{\sqrt{d}}. \end{aligned}$$

The convergence to this limit is slow enough that it is difficult to estimate the limit accurately from numerical data.

For \(d \le 2\) our numerical methods perform poorly, for the reasons described below. For \(d=3\) the bound for \(\mathrm {A}_+(d)\) in Table 2 is obtained using \(k=27\), and for \(d \ge 4\) we use \(k=32\). In particular, we deliberately use a smaller value of k than the limits of our computations for \(d \ge 4\), so that we can use data from larger k to estimate the rate of convergence. These computations suggest the following conjecture.

Conjecture 4.1

For \(3 \le d \le 32\), the upper bounds for \(\mathrm {A}_+(d)\) and \(\mathrm {A}_{-}(d-4)\) in Table 2 are sharp, except for an error of at most 1 in the last decimal digit shown.

In each case with \(d \ge 3\), we can use a summation formula to check that we have found the optimal bound for the given values of d and k; we explain how this is done in Sect. 5. However, we do not know how quickly the bounds converge as \(k \rightarrow \infty \), or whether they indeed converge to \(\mathrm {A}_s(d)\) at all. Our confidence in Conjecture 4.1 comes from comparing the bounds for \(32 \le k \le 64\) when \(d \ge 5\). They seem to have converged to this number of digits, but of course we cannot rule out convergence to the wrong limit.

The approximation \(\mathrm {A}_+(d) \approx \mathrm {A}_-(d-4)\) and equality \(\mathrm {A}_+(12) = \mathrm {A}_-(8) = \sqrt{2}\) raise the question of whether the other exact values \(\mathrm {A}_-(1)=1\), \(\mathrm {A}_-(2) = (4/3)^{1/4}\) (conjecturally), and \(\mathrm {A}_-(24)=2\) are also mirrored by \(\mathrm {A}_+\). That turns out not to be the case: Table 2 strongly suggests that \(\mathrm {A}_+(5) > 1\) and \(\mathrm {A}_+(6) > (4/3)^{1/4}\), and it proves that \(\mathrm {A}_+(28) < 2\). The case of \(\mathrm {A}_+(28)\) is particularly disappointing, because it might have stood in the same relationship to \(\mathrm {A}_+(12)\) as the Leech lattice does to the \(E_8\) root lattice. We have found no case other than \(d=12\) for which we can guess the exact value of \(\mathrm {A}_+(d)\).

Taking \(k=128\) shows that \(\mathrm {A}_+(28) < 1.98540693489105\), and again we believe that all these digits agree with \(\mathrm {A}_+(28)\) except the last. This upper bound for \(\mathrm {A}_+(28)\) seems discouragingly complicated, but the underlying root locations display remarkable behavior, shown in Table 3. The table leads us to the following conjecture:

Table 3 Approximations to \(r(g)^2, \rho _1^2, \rho _2^2, \dots , \rho _{31}^2\) when \(d=28\) and \(k=128\)

Conjecture 4.2

There exists a radial Schwartz function \(g \in \mathcal {A}_+(28) {\setminus } \{0\}\) with \(\widehat{g} = g\), \(g(0)=0\), and \(r(g) = \mathrm {A}_+(28)\), and whose nonzero roots are at radii \(\sqrt{2j + o(1)}\) as \(j \rightarrow \infty \), starting with \(j=2\).

This pattern is reminiscent of [10, Section 7], as well as the behavior of \(\mathrm {A}_\pm (d)\) in other cases, but it is a particularly striking example. We expect that Conjecture 4.2 is true, but a weaker conjecture consistent with the data is that there exists some \(\varepsilon <1\) such that the squared radii are within \(\varepsilon \) of successive even integers.

For comparison, [8] constructs a function achieving \(\mathrm {A}_-(24)\) whose nonzero roots are exactly at \(\sqrt{2j}\) with \(j \ge 2\). Our best guess is that the function achieving \(\mathrm {A}_+(28)\) is given by a primary term that has these exact roots, plus one or more secondary terms that perturb the roots but do not substantially change them. If that is the case, then perhaps one can describe this function explicitly and thereby characterize \(\mathrm {A}_+(28)\) exactly. However, we have not been able to guess or derive such a formula.

Another mystery is the behavior of \(\mathrm {A}_+(d)\) for \(d \le 2\). In these dimensions we quickly run into cases in which the last sign change r(g) is not a continuous function of \(\rho _1,\dots ,\rho _k\) at the optimum, and this lack of continuity ruins our numerical algorithms. (Instead, we resort to linear programming, which is much slower.) Of course it is no surprise that the last sign change is discontinuous at some points, because a small perturbation of a polynomial can convert a double root to two single roots, or even create a new root if the degree increases. However, we do not expect this behavior to occur generically. In particular, it cannot occur if \(\deg (p)=4k+2\) and g has no double roots beyond the k double roots we have forced to occur.

When \(d=2\), even the case \(k=1\) is problematic. Specifically, one can check that the optimal value \(r(g) = \sqrt{2/\pi }\) is achieved by setting \(\rho _1 = \sqrt{3/\pi }\). As \(\rho _1\) approaches \(\sqrt{3/\pi }\) from the left, r(g) decreases towards \(\sqrt{2/\pi }\), but it increases towards infinity as \(\rho _1\) approaches \(\sqrt{3/\pi }\) from the right. This discontinuity occurs because the leading coefficient of the polynomial p vanishes when \(\rho _1 = \sqrt{3/\pi }\). The leading coefficient also vanishes at the best choices of \(\rho _1,\dots ,\rho _k\) we have found for \(2 \le k \le 4\), while the case \(k=5\) suffers from a different problem: the resulting polynomial has six double roots, rather than just five, and again the location of the last sign change is discontinuous.

When \(d=1\), there are no problems for \(k \le 2\), and the leading coefficient vanishes for \(k=3\). For \(k=4\), we find an extra double root, but there is no discontinuity when \(k=5\).

In Table 2 we have reported the bound using \(k=5\) for \(d\le 2\). We believe that we have approximated the true optima for \(k=5\), but the bounds almost certainly do not agree with \(\mathrm {A}_+(d)\) to the full six digits shown, unlike Conjecture 4.1.

We have not observed a discontinuity near the optimum in any other dimension. However, when \(d=3\) we cannot find a local optimum with \(k=28\), because the largest root tends to infinity in our calculations. Computations carried out by David de Laat indicate that the optimum occurs at a singularity and the resulting discontinuity is interfering with our algorithms. When \(d=4\) we run into a similar problem at \(k=36\). We do not know whether this phenomenon is limited to \(d \le 4\).

5 Summation formulas

We do not know how to obtain the hypothetical summation formulas described in Conjecture 2.5. Aside from \(\mathrm {A}_-(2)\) and the four cases that have been solved exactly (namely \(\mathrm {A}_-(1)\), \(\mathrm {A}_-(8)\), \(\mathrm {A}_+(12)\), and \(\mathrm {A}_-(24)\)), we have not found any summation formulas that come close to matching our upper bounds. However, in many cases we can compute optimal summation formulas for polynomials of a fixed degree. For \(d \ge 3\), these formulas show that we have found the optimal polynomials for each fixed k in our computations in Sect. 4, and we believe that when k is large they should approximate the ultimate summation formulas. For example, Table 1 is based on calculations with \(k=128\).

Recall that our numerical method uses the Laguerre eigenbasis. If we are bounding \(\mathrm {A}_s(d)\), we let \(\nu = d/2-1\) and

$$\begin{aligned} q_j = {\left\{ \begin{array}{ll} L_{2j}^\nu &{} \text {if }\quad s=1\text {, and}\\ L_{2j+1}^\nu &{} \text {if }\quad s=-1. \end{array}\right. } \end{aligned}$$

Then our method seeks a linear combination p of \(q_0,q_1,\dots ,q_{2k+1}\) that vanishes at 0 and minimizes r(p); using the function \(f(x) = p(2\pi |x|^2)e^{-\pi |x|^2}\), we conclude that \(\mathrm {A}_s(d) \le \sqrt{r(p)/(2\pi )}\), where

$$\begin{aligned} r(p) = \inf {} \{R \ge 0: p(x)\text { has the same sign for }x\ge R\}. \end{aligned}$$

(Unlike earlier, we require only \(x \ge R\) in the definition of r(p), rather than \(|x| \ge R\), because we care only about the right half-line.) To construct p, we impose double roots at locations \(\rho _1,\dots ,\rho _k\), and then choose these locations so as to minimize \(\rho _0 := r(p)\). Note that in our notation here, \(\rho _i\) denotes what would have been called \(2\pi \rho _i^2\) in Sect. 4.

To obtain a summation formula, we will need to impose some non-degeneracy conditions. We will assume that \(0< \rho _0< \rho _1< \dots < \rho _k\), and that p is uniquely determined among linear combinations of \(q_0,\dots ,q_{2k+1}\) by the following conditions:

  1. (1)

    \(p(0)=0\),

  2. (2)

    \(p(\rho _i) = p'(\rho _i)=0\) for \(1 \le i \le k\), and

  3. (3)

    the coefficient of \(q_{2k+1}\) is 1.

We assume furthermore that p has roots of order exactly 1 at \(\rho _0\) and exactly 2 at \(\rho _1,\dots ,\rho _k\), and no other real roots greater than \(\rho _0\). Finally, we assume that we have found a strict local minimum for r(p); in other words, r(p) increases if we perturb \(\rho _1,\dots ,\rho _k\).

These assumptions cannot always be satisfied. For example, when \((s,d,k)=(1,2,1)\) the coefficient of \(q_{2k+1}\) vanishes. However, for \(d>2\) they are satisfied in every case in which we have found a local minimum. See Table 4 for a list.

Table 4 Values of k for which we have numerically computed a local minimum and the corresponding summation formula to one hundred decimal places

Proposition 5.1

Under the hypotheses listed above, up to scaling there are unique coefficients \(c_0,\dots ,c_{k+1}\), not all zero, such that

$$\begin{aligned} \sum _{i=0}^k c_i g(\rho _i) + c_{k+1} g(0) = 0 \end{aligned}$$

for every linear combination g of \(q_0,\dots ,q_{2k+1}\). Furthermore, \(c_0,\dots ,c_k\) are nonzero and have the same sign. If \(s=1\), then \(c_{k+1}\) is nonzero and has the opposite sign.

We prove this proposition below. It is a polynomial analogue of the summation formula (2.4) (with the Gaussian factors from the Laguerre eigenbasis implicitly incorporated into the coefficients \(c_i\)), and it is reminiscent of Gauss-Jacobi quadrature in that it holds on a \((2k+2)\)-dimensional space despite using only \(k+2\) coefficients.

Corollary 5.2

Any linear combination g of \(q_0,\dots ,q_{2k+1}\) with \(g(0)=0\) and \(r(g) < \rho _0\) must vanish identically, and p is the unique linear combination achieving \(r(p) = \rho _0\), up to scaling.

In other words, although we have assumed only a strict local minimum for the last sign change among polynomials with k double roots, we have found the global minimum among polynomials with no such restriction. For example, when \(s=1\) and \(k=64\), we find that p is the best possible polynomial of degree at most \(4k+2=258\). This phenomenon not only certifies our numerics by establishing matching lower bounds, but also helps explain why our algorithms perform well: degeneracy is the only way to get stuck in a local optimum.

Proof of Corollary 5.2

Suppose g is a linear combination of \(q_0,\dots ,q_{2k+1}\) with \(r(g) \le \rho _0\), \(g(0) = 0\), and \(g(z) \ge 0\) for large z. By Proposition 5.1,

$$\begin{aligned} \sum _{i=0}^k c_i g(\rho _i) = -\, c_{k+1} g(0) = 0. \end{aligned}$$

Because \(\rho _0 \ge r(g)\), all of \(g(\rho _0),\dots ,g(\rho _k)\) must be nonnegative. It follows that g must vanish at \(\rho _0,\dots ,\rho _k\), since \(c_0,\dots ,c_k\) are nonzero and have the same sign. Furthermore, \(\rho _1,\dots ,\rho _k\) must be roots of even order, since otherwise g would change sign beyond r(g). However, we have assumed that the equations \(g(0)=0\), \(g(\rho _i)=0\), and \(g'(\rho _i)=0\) for \(1 \le i \le k\) determine g up to scaling. Thus g must be proportional to p, and the only way to achieve \(r(g) < r(p)\) is if g vanishes identically. \(\square \)

It will prove convenient to distinguish between \(\rho _1,\dots ,\rho _k\) and perturbations of these points. For that purpose, we fix \(\rho _1,\dots ,\rho _k\) as the values described above, while \(\widetilde{\rho }_1,\dots ,\widetilde{\rho }_k\) are variables taking values in some neighborhood of \(\rho _1,\dots ,\rho _k\).

The proof of Proposition 5.1 involves carefully studying how different quantities behave as functions of \(\widetilde{\rho }_1,\dots ,\widetilde{\rho }_k\). We can set up simultaneous linear equations to determine the coefficients of \(q_0,\dots ,q_{2k+1}\) as follows. Write \(\alpha = (\alpha _j)_{0 \le j \le 2k+1}\) for the column vector of coefficients (all vectors will be column vectors unless otherwise specified, sometimes indexed starting with 0 and sometimes with 1), and define the entries of the matrix \(M = (M_{i,j})_{0 \le i,j \le 2k+1}\) as follows:

$$\begin{aligned} M_{i,j} = {\left\{ \begin{array}{ll} q_j(0) &{} \text {for }\quad i=0,\\ q_j(\widetilde{\rho }_i) &{} \text {for }\quad 1 \le i \le k,\\ q_j'(\widetilde{\rho }_{i-k}) &{} \text {for }\quad k+1 \le i \le 2k,\text { and}\\ \delta _{j,2k+1} &{} \text {for }\quad i=2k+1. \end{array}\right. } \end{aligned}$$

Let \(v = (\delta _{i,2k+1})_{0 \le i \le 2k+1}\). Then the equation \(M \alpha = v\) expresses the constraints that \(\sum _{j=0}^{2k+1} \alpha _j q_j\) vanishes at 0, vanishes to second order at \(\widetilde{\rho }_1,\dots ,\widetilde{\rho }_k\), and has \(\alpha _{2k+1}=1\).

We write \(\widetilde{\rho }= (\widetilde{\rho }_1,\dots ,\widetilde{\rho }_k)\) and \(\rho = (\rho _1,\dots ,\rho _k)\). When necessary to avoid confusion, we write \(M(\widetilde{\rho })\) for the matrix depending on \(\widetilde{\rho }\), \(\alpha (\widetilde{\rho })\) for the solution of \(M(\widetilde{\rho }) \alpha = v\) if \(M(\widetilde{\rho })\) is invertible, and \(p_{\widetilde{\rho }}\) for the corresponding linear combination \(\sum _{j=0}^{2k+1} \alpha _j q_j\) of \(q_0,\dots ,q_{2k+1}\). Thus, the polynomial p discussed above amounts to \(p_{\rho }\).

We have assumed that \(M(\rho )\) is invertible, which means that \(\alpha (\widetilde{\rho })\) and \(p_{\widetilde{\rho }}\) are smooth functions of \(\widetilde{\rho }\) defined on some neighborhood of \(\rho \). Because \(p_{\rho }\) has a single root at \(\rho _0\), \(p_{\widetilde{\rho }}\) has a single root at some smooth function \(\widetilde{\rho }_0\) of \(\widetilde{\rho }_1,\dots ,\widetilde{\rho }_k\) with \(\widetilde{\rho }_0(\rho ) = \rho _0\), by the implicit function theorem. We will always assume that \(\widetilde{\rho }\) is in a small enough neighborhood of \(\rho \) for this to be true. Furthermore, our assumptions so far imply that \(r(p_{\widetilde{\rho }}) = \widetilde{\rho }_0\) for \(\widetilde{\rho }\) in some neighborhood of \(\rho \), and again we restrict our attention to such a neighborhood.

Because of our assumption of local minimality, the function \(\widetilde{\rho }_0\) must have a stationary point at \(\rho \). In other words,

$$\begin{aligned} \frac{\partial \widetilde{\rho }_0}{\partial \widetilde{\rho }_i}(\rho ) = 0 \end{aligned}$$

for \(1 \le i \le k\). In addition, \(\widetilde{\rho }_0 > \rho _0\) for \(\widetilde{\rho }\ne \rho \) in some small neighborhood of \(\rho \) by strict local minimality. Once again we confine \(\widetilde{\rho }\) to such a neighborhood.

Lemma 5.3

The vectors \(\alpha (\rho )\) and \((\partial \alpha / \partial \widetilde{\rho }_i)(\rho )\) with \(1 \le i \le k\) are linearly independent.

Proof

The vector \(\alpha \) has \(\alpha _{2k+1}=1\), while all the partial derivatives \(\partial \alpha / \partial \widetilde{\rho }_i\) vanish in that coordinate. Thus, it will suffice to show that the partial derivatives are linearly independent at \(\rho \), and because M is invertible, we can examine \(M (\partial \alpha / \partial \widetilde{\rho }_i)\) instead of \(\partial \alpha / \partial \widetilde{\rho }_i\).

Differentiating \(M \alpha = v\) shows that

$$\begin{aligned} M \frac{\partial \alpha }{\partial \widetilde{\rho }_i} = - \frac{\partial M}{\partial \widetilde{\rho }_i} \alpha . \end{aligned}$$

The matrix \(\partial M / \partial \widetilde{\rho }_i\) vanishes except in rows i and \(k+i\), and the entries of \((\partial M / \partial \widetilde{\rho }_i) \alpha \) in those rows are \(p_{\widetilde{\rho }}'(\widetilde{\rho }_i)\) and \(p_{\widetilde{\rho }}''(\widetilde{\rho }_i)\), respectively. We have \(p_{\widetilde{\rho }}'(\widetilde{\rho }_i)=0\) by construction, but \(p_{\rho }''(\rho _i) \ne 0\). Thus, the vectors \((\partial M / \partial \widetilde{\rho }_i)(\rho )\, \alpha (\rho )\) are linearly independent, as desired. \(\square \)

Lemma 5.4

There are real numbers \(c_0,\dots ,c_{k+1}\), not all zero, such that

$$\begin{aligned} \sum _{i=0}^k c_i g(\rho _i) + c_{k+1} g(0) = 0 \end{aligned}$$

for every linear combination g of \(q_0,\dots ,q_{2k+1}\).

This lemma differs from Proposition 5.1 in not asserting uniqueness or sign conditions for \(c_0,\dots ,c_{k+1}\).

Proof

Define the matrix

$$\begin{aligned} T = (T_{i,j})_{\begin{array}{c} 0 \le i \le k+1\\ 0 \le j \le 2k+1 \end{array}} \end{aligned}$$

by

$$\begin{aligned} T_{i,j} = {\left\{ \begin{array}{ll} q_j(\rho _i) &{} \text {for }\quad 0 \le i \le k\text {, and}\\ q_j(0) &{} \text {for }\quad i = k+1. \end{array}\right. } \end{aligned}$$

Then

$$\begin{aligned} (c_0,\dots ,c_{k+1})^{\top }T = \left( \sum _{i=0}^k c_i q_j(\rho _i) + c_{k+1} q_j(0)\right) ^{\top }_{0 \le j \le 2k+1} \end{aligned}$$

for all row vectors \((c_0,\dots ,c_{k+1})^{\top }\). Thus, the desired summation formula amounts to a nonzero row vector in the kernel of right multiplication by T. To prove that such a vector exists, we will show that \({{\,\mathrm{\mathrm {rank}}\,}}(T) < k+2\).

It will suffice to find \(k+1\) linearly independent vectors in the kernel of left multiplication by T, because \((2k+2)-(k+1) < k+2\). Those vectors will be \(\alpha (\rho )\) and \((\partial \alpha /\partial \widetilde{\rho }_i)(\rho )\) for \(1 \le i \le k\), which are linearly independent by Lemma 5.3. All that remains is to prove that they are in the kernel of T.

We have \(T\alpha = (p_{\widetilde{\rho }}(\rho _0),\dots ,p_{\widetilde{\rho }}(\rho _k),p_{\widetilde{\rho }}(0))\), and thus \(T \alpha (\rho ) = 0\). For the partial derivatives, we must show that

$$\begin{aligned} \sum _{j=0}^{2k+1} \frac{\partial \alpha _j}{\partial \widetilde{\rho }_i}(\rho ) \, q_j(\rho _n) = 0 \end{aligned}$$
(5.1)

for \(0 \le n \le k\) and

$$\begin{aligned} \sum _{j=0}^{2k+1} \frac{\partial \alpha _j}{\partial \widetilde{\rho }_i}(\rho ) \, q_j(0) = 0. \end{aligned}$$

The latter equation follows from differentiating the identity

$$\begin{aligned} \sum _{j=0}^{2k+1} \alpha _j q_j(0) = 0. \end{aligned}$$

To prove (5.1), we start with the fact that

$$\begin{aligned} \sum _{j=0}^{2k+1} \alpha _j q_j(\widetilde{\rho }_n) = 0 \end{aligned}$$

for \(0 \le n \le k\). Differentiating with respect to \(\widetilde{\rho }_i\) shows that

$$\begin{aligned} \sum _{j=0}^{2k+1} \frac{\partial \alpha _j}{\partial \widetilde{\rho }_i} q_j(\widetilde{\rho }_n) + \sum _{j=0}^{2k+1} \alpha _j q_j'(\widetilde{\rho }_n) \frac{\partial \widetilde{\rho }_n}{\partial \widetilde{\rho }_i}= 0. \end{aligned}$$

It follows that

$$\begin{aligned} \sum _{j=0}^{2k+1} \frac{\partial \alpha _j}{\partial \widetilde{\rho }_i}(\rho ) \, q_j(\rho _n) = 0, \end{aligned}$$

because \(\partial \widetilde{\rho }_0/\partial \widetilde{\rho }_i\) vanishes at \(\rho \) while for \(1 \le n \le k\),

$$\begin{aligned} \sum _{j=0}^{2k+1} \alpha _j q_j'(\widetilde{\rho }_n) = 0. \end{aligned}$$

We have therefore found \(k+1\) linearly independent vectors in the kernel of left multiplication by T, as desired. \(\square \)

Proof of Proposition 5.1

By Lemma 5.4, a summation formula exists, and all that remains is to prove uniqueness and the sign conditions.

Because \(M(\rho )\) is nonsingular, the values g(0) and \(g(\rho _i)\) with \(1 \le i \le k\) can be chosen arbitrarily. Thus, the summation formula must be unique up to scaling, and the coefficient \(c_0\) of \(\rho _0\) cannot vanish.

Now let \(1 \le i \le k\), and let \(\widetilde{\rho }\) equal \(\rho \) except in the i-th coordinate, where \(\widetilde{\rho }_i = \rho _i+\varepsilon \) with \(\varepsilon >0\) small. Then \(p_{\widetilde{\rho }}(\rho _i)\) and \(p_{\widetilde{\rho }}(\rho _0)\) have opposite signs because \(r(p_{\widetilde{\rho }}) > r(p_{\rho })\), while \(p_{\widetilde{\rho }}\) vanishes at the rest of \(\rho _1,\dots ,\rho _k\). It follows from taking \(g = p_{\widetilde{\rho }}\) that \(c_i\) must be nonzero, with the same sign as \(c_0\).

Finally, when \(s=1\) we can compute the sign of \(c_{k+1}\) by taking \(g=q_0=1\) to obtain

$$\begin{aligned} \sum _{i=0}^{k+1}c_i = 0. \end{aligned}$$

\(\square \)

When \(s=-\,1\), we conjecture that \(c_{k+1}\) always has the same sign as \(c_0,\dots ,c_k\). This conjecture holds for every case listed in Table 4.