An optimal uncertainty principle in twelve dimensions via modular forms

We prove an optimal bound in twelve dimensions for the uncertainty principle of Bourgain, Clozel, and Kahane. Suppose $f \colon \mathbb{R}^{12} \to \mathbb{R}$ is an integrable function that is not identically zero. Normalize its Fourier transform $\widehat{f}$ by $\widehat{f}(\xi) = \int_{\mathbb{R}^d} f(x)e^{-2\pi i \langle x, \xi\rangle}\, dx$, and suppose $\widehat{f}$ is real-valued and integrable. We show that if $f(0) \le 0$, $\widehat{f}(0) \le 0$, $f(x) \ge 0$ for $|x| \ge r_1$, and $\widehat{f}(\xi) \ge 0$ for $|\xi| \ge r_2$, then $r_1r_2 \ge 2$, and this bound is sharp. The construction of a function attaining the bound is based on Viazovska's modular form techniques, and its optimality follows from the existence of the Eisenstein series $E_6$. No sharp bound is known, or even conjectured, in any other dimension. We also develop a connection with the linear programming bound of Cohn and Elkies, which lets us generalize the sign pattern of $f$ and $\widehat{f}$ to develop a complementary uncertainty principle. This generalization unites the uncertainty principle with the linear programming bound as aspects of a broader theory.


Introduction
An uncertainty principle expresses a fundamental tradeoff between the properties of a function f and its Fourier transform f . The most common variants measure the dispersion, with the tradeoff being that f and f cannot both be highly concentrated near the origin. Motivated by applications to number theory, Bourgain, Clozel, and Kahane [2] proved an elegant uncertainty principle for the signs of f and f : if these functions are nonpositive at the origin and not identically zero, then they cannot both be nonnegative outside an arbitrarily small neighborhood of the origin. We can state this principle more formally as follows.
We say that a function f : R d → R is eventually nonnegative (resp., nonpositive) if f (x) ≥ 0 (resp., f (x) ≤ 0) for all sufficiently large |x|. If that is the case, we let r(f ) = inf {R ≥ 0 : f (x) has the same sign for |x| ≥ R} be the radius of its last sign change. We normalize the Fourier transform f of f by Let A + (d) denote the set of functions f : R d → R such that (1) f ∈ L 1 (R d ), f ∈ L 1 (R d ), and f is real-valued (i.e., f is even), (2) f is eventually nonnegative while f (0) ≤ 0, and (3) f is eventually nonnegative while f (0) ≤ 0.
Date: February 20, 2019. This work was begun during a visit by Gonçalves to Microsoft Research New England. (Note the tension in (2) between the eventual nonnegativity of f and the inequality R n f = f (0) ≤ 0, and the analogous tension in (3).) The uncertainty principle of Bourgain, Clozel, and Kahane [2, Théorème 3.1] says that Taking the geometric mean of r(f ) and r( f ) is a natural way to eliminate scale dependence, because rescaling the input of f preserves this quantity. Thus, the uncertainty principle amounts to saying that r(f ) and r( f ) cannot both be made arbitrarily small if f ∈ A + (d) \ {0}. Gonçalves, Oliveira e Silva, and Steinerberger [12,Theorem 3] proved that for each dimension d there exists a radial function f ∈ A + (d) \ {0} such that f = f and r(f ) = r( f ) = A + (d); furthermore, A + (d) is exactly the minimal value of r(g) in the following optimization problem: Problem 1.1 (+1 eigenfunction uncertainty principle). Minimize r(g) over all g : R d → R such that (1) g ∈ L 1 (R d ) \ {0} and g = g, and (2) g(0) = 0 and g is eventually nonnegative.
The name "+1 eigenfunction" refers to the fact that g is a eigenfunction of the Fourier transform with eigenvalue +1.
Upper and lower bounds for A + (d) are known [2,12], but the exact value has not previously been determined, or even conjectured, in any dimension. Our main result is a solution of this problem in twelve dimensions: Theorem 1.2. We have A + (12) = √ 2. In particular, there exists a radial Schwartz function f : R 12 → R that is eventually nonnegative and satisfies f = f , f (0) = 0, and r(f ) = r( f ) = √ 2.
Moreover, as a radial function f has a double root at |x| = 0, a single root at |x| = √ 2, and double roots at |x| = √ 2j for integers j ≥ 2.
See Figure 1.1 for plots. The appealing simplicity of this answer seems to be unique to twelve dimensions, and we have been unable to conjecture a closed form for A + (d) in any other dimension d. See Section 4 for an account of the numerical evidence, which displays noteworthy patterns and regularity despite the lack of any exact conjectures.
We find the exceptional role of twelve dimensions surprising: why should a seemingly arbitrary dimension admit an exact solution with mysterious arithmetic structure not shared by other dimensions? As far as we are aware, Theorem 1.2 is the first time such behavior has arisen in an uncertainty principle.
The proof of Theorem 1.2 makes use of modular forms. The lower bound A + (12) ≥ √ 2 follows from the existence of the Eisenstein series E 6 , while the upper bound A + (12) ≤ √ 2 is based on Viazovska's methods, which were developed to solve the sphere packing problem in eight dimensions [18] and twenty-four dimensions [8] (see also [4] for an exposition). We prove both bounds for A + (12) in Section 2. The upper image is a cross section of the graph of x → f (x) for |x| 2 ≤ 8; note that this function decreases rapidly enough that the double roots are nearly invisible. The function in the lower image is instead proportional to x → |x| 11 f (x). This transformation distorts the picture but clarifies the behavior, because |x| 11 is proportional to the surface area of a sphere of radius |x| in R 12 ; thus, onedimensional integrals of the plotted function are proportional to integrals of f in R 12 .
The close relationship of this uncertainty principle with sphere packing may seem surprising, given that Problem 1.1 makes no reference to any discrete structures. The connection is through the Euclidean linear programming bound of Cohn and Elkies [5], which converts a suitable auxiliary function f into an upper bound for the sphere packing density ∆ d in R d . Suppose f : R d → R is an integrable function such that f is also integrable and real-valued (i.e., f is even), f (0) = f (0) = 1, f ≥ 0 everywhere, and f is eventually nonpositive. Then the linear programming bound obtained from f is the upper bound where B d R is the closed ball of radius R about the origin in R d . (Strictly speaking, the proof in [5] requires additional decay hypotheses on f and f ; see [11,Theorem 3.3] for a proof in the generality of our statement here.) Optimizing this bound amounts to minimizing r(f ).
Based on numerical evidence and analogies with other problems in coding theory, Cohn and Elkies conjectured the existence of functions f achieving equality in (1.1) when d ∈ {2, 8, 24}, and they proved it when d = 1. The case d = 2 remains an open problem today, despite the existence of elementary solutions of the two-dimensional sphere packing problem by other means (see, for example, [13]). However, the case d = 8 was proved fourteen years later in a breakthrough by Viazovska [18], and the case d = 24 was proved shortly thereafter based on her approach [8]. These papers solved the sphere packing problem in dimensions 8 and 24.
The problem of optimizing the linear programming bound for ∆ d already appears somewhat similar to Problem 1.1, but there is a deeper analogy based on a problem studied by Cohn and Elkies in [5,Section 7]. Given an auxiliary function f for the sphere packing bound, let g = f − f . Note that g is not identically zero, because otherwise f and f would both have compact support (thanks to their opposite signs outside radius r(f )), which would imply that f = f = 0. Then g satisfies the conditions of the following problem, with r(g) ≤ r(f ): and g = −g, and (2) g(0) = 0 and g is eventually nonnegative.
This problem has been solved for d ∈ {1, 8, 24}, as a consequence of the sphere packing bounds mentioned above; the answers are 1, √ 2, and 2, respectively. When d = 2, it is conjectured that the optimal value of r(g) is (4/3) 1/4 , but no proof is known. No other closed forms have been identified.
Cohn and Elkies conjectured [5,Conjecture 7.2] that the minimal value of r(g) in Problem 1.3 is exactly the same as that of r(f ) in the linear programming bound, and that in fact an auxiliary function f for the linear programming bound can always be reconstructed from an optimal g via g = f − f . Nobody has proved that such an f always exists, but numerical evidence strongly supports this conjecture.
We can extend Problem 1.3 to a broader uncertainty principle as follows. Let and note that every function g in Problem 1.3 satisfies r(g) ≥ A − (d).
For completeness, we state our next theorem for both ±1 cases, although all the results in the following theorem were already proved for the +1 case by Gonçalves, Oliveira e Silva, and Steinerberger in [12]. Note that we regard A +1 and A −1 as synonymous with A + and A − , respectively.
for all d. Moreover, for each d there exists a radial function f ∈ A s (d) \ {0} with f = sf , f (0) = 0, and r(f ) = A s (d).
Furthermore, any such function must vanish at infinitely many radii greater than A s (d).
In particular, A − (d) > 0. Thus, we obtain a natural counterpart to the uncertainty principle of Bourgain, Clozel, and Kahane, but with f and f having opposite signs, and with the optimal function coming from Problem 1.3. We can take c = 1/ √ 2πe and C = 1.
This uncertainty principle places the linear programming bound in a broader analytic context and gives a deeper significance to the auxiliary functions that optimize this bound. Outside of a few exceptional dimensions, they do not seem to come close to solving the sphere packing problem, but they conjecturally achieve an optimal tradeoff between sign conditions in the uncertainty principle.
Except for extremal functions for A + (1), our proof in Section 3.3 and the proof in [12] actually show that any extremal function cannot be eventually positive; that is, it must vanish on spheres with arbitrarily large radii, not just at infinitely many radii greater than A s (d). We strongly believe that this is the case for A + (1) as well. Problems 1.1 and 1.3 are closely related and behave in complementary ways. We prove Theorem 1.4 by adapting the techniques of [12] to −1 eigenfunctions. However, the analogy between these problems is not perfect. For example, the equality A + (12) = A − (8) = √ 2 suggests that perhaps A + (28) = A − (24) = 2, but that turns out to be false (see Section 4). Similarly, relatively simple explicit formulas show that A − (1) = 1, while A + (1) remains a mystery.
In addition to its values in specific dimensions, the asymptotic behavior of A s (d) as d → ∞ is of substantial interest. It was shown in [2] that In Section 3, we obtain the same lower bound for the case of A − (d), and an improved upper bound of 0.3194 . . . for that case based on [11] (the exact value is complicated). See Section 4 for the numerical evidence supporting this conjecture. We expect that the common value of these limits is strictly between the bounds 0.2419 . . . and 0.3194 . . . , and perhaps not so far from the latter.
In the remainder of the paper, we prove Theorem 1.2 in Section 2 and Theorem 1.4 in Section 3. In Section 4 we present numerical computations and conjectures, and we conclude in Section 5 with a construction of summation formulas that validate our numerics and lend support to our general conjectures about A s (d).
2. The +1 eigenfunction uncertainty principle in dimension 12 In this section, we prove Theorem 1.2.
2.1. Optimality. We begin by establishing that A + (12) ≥ √ 2. For this inequality, we use a special Poisson-type summation formula for radial Schwartz functions f : R 12 → C based on the modular form E 6 . Converting a modular form into such a formula is a standard technique; for completeness, we will give a direct proof.
Consider the normalized Eisenstein series E 6 : H → C, where H denotes the upper half-plane in C (see, for example, [19, §2]). This function has the Fourier expansion where c j = 504σ 5 (j) and σ 5 (j) is the sum of the fifth powers of the divisors of j. In particular, c j > 0 for j ≥ 1 and we have the trivial bound c j ≤ 504j 6 . Because E 6 is a modular form of weight 6 for SL 2 (Z), it satisfies the identity This identity turns into a summation formula for a Gaussian f : 2) yields The key to proving that A + (12) ≥ √ 2 is the following lemma, which extends this summation formula to arbitrary radial Schwartz functions.
We follow the approach used to prove Theorem 1 in [16, Section 6].
Proof. Let Λ : S rad (R 12 ) → C be the functional on the radial Schwartz space S rad (R 12 ). As noted above, Λ(f ) = 0 whenever f (x) = e −πα|x| 2 with Re(α) > 0. Moreover, the bound c j = O(j 6 ) shows that Λ is a continuous linear functional in the topology of the Schwartz space. Thus, we need only prove our desired identity for compactly supported, radial C ∞ functions, which are dense in S rad (R 12 ).
Write f (x) = F (|x| 2 )e −π|x| 2 , where F : R → R is a smooth and compactly supported function. Let F be the one-dimensional Fourier transform of F , and note that F is also rapidly decreasing. By Fourier inversion, The functions x → T −T F (t)e −π(1−2it)|x| 2 dt belong to S rad (R 12 ) for each T > 0 and converge to f in the Schwartz topology. Moreover, where the commutation is justified since the Riemann sums of the integral converge to the integral in the topology of S rad (R 12 ). This finishes the proof of the lemma.
Noam Elkies has provided the following alternative proof of Lemma 2.1 using Poisson summation. Explicit calculation shows that one can write the modular form E 6 in terms of theta series of lattices and their duals as where L is the D 12 root lattice rescaled by a factor of 1/ √ 2. Then the summation formula from Lemma 2.1 becomes a linear combination of the Poisson summation formulas for the lattices D 12 , D * 12 , L, and L * , which implies that it holds for all radial Schwartz functions. This argument shows that Lemma 2.1 is closely related to Poisson summation, while the proof we gave above applies directly to other modular forms as well as E 6 .
Proof. Without loss of generality, we can assume f is a radial function; otherwise, we simply average its rotations about the origin. (If the averaged function vanishes at radius √ 2j, then so does f because r(f ) ≤ √ 2, and the same holds for f .) If f is a radial Schwartz function, then Lemma 2.1 implies that . For general f , we can apply a standard mollification argument. Let ϕ : R d → R be a nonnegative, radial C ∞ function supported in the unit ball B d 1 with ϕ ≥ 0 and ϕ(0) = 1, so that the functions ϕ ε defined for ε > 0 by ϕ ε (x) = ε −d ϕ(x/ε) form an approximate identity.
To see why, note that ϕ ε is a Schwartz function, while f * ϕ ε is smooth and all its derivatives are bounded. Now that we have Schwartz functions approximating f , we again apply Lemma 2.1 to obtain To derive information from this identity, we combine the limits We will now apply this lemma to prove the lower bound A + (12) ≥ √ 2.
Proof. By rescaling the input to f , we can assume without loss of generality that r(f ) and r( f ) are both less than √ 2. Now we apply Lemma 2.2 to a rescaled version of f . Choose λ > 0 and let g(x) = f (λx). Then g(ξ) = λ −12 f (ξ/λ), and it follows that g ∈ A + (12). Moreover, if λ is close enough to 1, then r(g) and r( g) are both less than √ 2. By Lemma 2.2, if λ is sufficiently close to 1, then g(x) = 0 whenever |x| = √ 2j with j ≥ 1. Thus there exists some λ 0 > 1 such that f (x) = 0 whenever |x| ∈ ( √ 2j/λ 0 , √ 2jλ 0 ) and j ≥ 1, and the same holds for f . The union of these intervals covers the entire half-line [R, ∞) for some R > 0, because In other words, f and f both have compact support, which implies that f = 0.
Exactly the same technique applies to any dimension and sign: For example, for k ≥ 2, the summation formula coming from the Eisenstein series E 2k proves that A (−1) k−1 (4k) ≥ √ 2. This lower bound is sharp for k = 2 and k = 3, but it is not even true for k = 1, because E 2 is merely a quasimodular form.
The summation formula (2.3) automatically holds when f = −sf . Thus, it is equivalent to the assertion that In the case s = −1, this conjecture is analogous to [3,Conjecture 4.2]. It holds in every case in which A s (d) is known exactly: the summation formulas that establish sharp lower bounds for A − (1), A − (8), and A − (24) are Poisson summation over the Z, E 8 , and Leech lattices, respectively, while the A + (12) case is Lemma 2.1. The conjectured value of A − (2) corresponds to Poisson summation over the isodual scaling of the A 2 lattice. Conjecture 2.5 is not known to hold in any other case, nor can we guess what the summation formula should be, but the numerical and theoretical evidence in favor of this conjecture is compelling (see Sections 4 and 5). In particular, in most cases we can compute the constants c j and ρ j in these conjectural summation formulas to high precision.
The coefficients c j are integers in the five exact cases listed above, but integral coefficients seem to be rare, and it is plausible that no more such cases exist. One interesting example is the (conjectural) summation formula that yields A + (28). It is natural to guess that A + (28) = 2, in accordance with A + (12) = A − (8) and  Table 2.1, we approximate a conjectural summation formula that would establish this equality, which we computed using the techniques of Section 5.
We are unable to describe the numbers ρ j and c j in the summation formula exactly, but we believe that ρ j = 2j + 4 + o(1) as j → ∞ (see Conjecture 4.2) and c j = (24 + o(1))σ 13 (j + 2). The latter equation says that −c j is asymptotic to the coefficient of e (2j+4)πiz in the Fourier expansion of the Eisenstein series E 14 , and indeed these coefficients are close to those in the table. Note that the difference between the role of E 14 here and that of E 6 when d = 12 is that the summation formula for d = 28 suppresses the −24e 2πiz term in E 14 (z) at the cost of perturbing all the remaining numbers.
2.2. Theta series and an extremal function in dimension 12. To prove the upper bound A + (12) ≤ √ 2, we will construct an explicit function f ∈ A + (12) To do so, we will use a remarkable integral transform discovered by Viazovska that turns modular forms into radial eigenfunctions of the Fourier transform. See [19] for background on modular forms, and [18,8,9] for other applications of this transform.
Viazovska's method can be summarized by the following proposition, which is implicit in [18] but was stated there only for a specific modular form with d = 8 (and similarly for d = 24 in [8]). We omit the proof, because it closely follows the same approach as [18, Propositions 5 and 6] and [8, Lemma 3.1]. All that needs to be checked is the dependence on the dimension d.
Proposition 2.6. Let d be a positive multiple of 4, and let ψ be a weakly holomorphic modular form of weight 2 − d/2 for Γ(2) such that Then f is a Schwartz function and an eigenfunction of the Fourier transform with eigenvalue (−1) 1+d/4 . Furthermore, Viazovska in fact developed two such techniques, one for each eigenvalue, and both are used in the sphere packing papers [18,8]. We will not need the other technique, which yields eigenvalue (−1) d/4 instead of (−1) 1+d/4 and uses a weakly holomorphic quasimodular form of weight 4 − d/2 and depth 2 for SL 2 (Z).
When applying Proposition 2.6, we will use the notation for theta functions from [18,8]. Their fourth powers Θ 4 00 , Θ 4 01 , and Θ 4 10 are modular forms of weight 2 for Γ(2), which satisfy the Jacobi identity Θ 4 00 = Θ 4 01 + Θ 4 10 and the transformation laws under the action of SL 2 (Z). We will also use the modular form ∆, defined by It is a modular form of weight 12 for the group SL 2 (Z), which contains Γ(2); thus ∆(z + 1) = ∆(z) and z −12 ∆(−1/z) = ∆(z). Using these ingredients, we will now construct a suitable modular form for use in Proposition 2.6, to prove Theorem 1.2. Let (We discuss the motivation for this definition at the end of this section.) Then ψ is a weakly holomorphic modular form of weight 4 · 2 − 12 = −4, and the identity can be checked using the formulas listed above. (Note that ψ is weakly holomorphic because the product formula shows that ∆ does not vanish in the upper half-plane.) Using the definitions for Θ 00 , Θ 01 , Θ 10 , and ∆ given above, we can compute the Fourier series This series is absolutely convergent in the upper half-plane, and thus |ψ(it)| = O e 2πt as t → ∞. Using the transformation laws again, we find that In particular, |t 4 ψ(i/t)| = O e −πt as t → ∞. Thus, ψ satisfies the hypotheses of Proposition 2.6 with d = 12 and K = 2. Define f : R 12 → R by (2.5). Then f is a radial Schwartz function satisfying f = f and for all t > 0, because Θ 00 (it), Θ 01 (it), and Θ 10 (it) are all real, while 0 < ∆(it) < 1. Thus, (2.8) implies that f (x) ≥ 0 for |x| > √ 2, with double roots at |x| = √ 2j for integers j ≥ 2 and no other roots in this range.
For comparison, the quasimodular form inequalities that play the same role as (2.9) in [18] and [8] are obtained via computer-assisted proofs. The reason for this discrepancy is that those proofs combine +1 and −1 eigenfunctions, which introduces technical difficulties. If all one wishes to prove is that A − (8) = √ 2 and A − (24) = 2, then one can avoid computer assistance. Specifically, the formula (3.1) in [8] is visibly positive in the same sense as our formula (2.6), and while that is not true for formula (46) in [18], it can be rewritten so as to be visibly positive (see, for example, the corresponding formula in [4]).
To analyze the behavior of f (x) with 0 ≤ |x| ≤ √ 2, we can simply cancel the growth of ψ(it). The series (2.7) shows that to second order at |x| = √ 2 and to fourth order at the origin, and so f (x) has a single root at |x| = √ 2 and a double root at the origin. More specifically, In particular, f (0) = 0. It follows that f ∈ A + (12), and therefore A + (12) ≤ √ 2, as desired. We have now proved all of the assertions from Theorem 1.2.
As the quadratic term −66π|x| 2 suggests, our construction of f is scaled so that it values are rather large. For example, its minimum value appears to be f (x) ≈ −23.8088, achieved when |x| ≈ 0.557391. In Figure 1.1, we have plotted a more moderate scaling of this function.
To arrive at the definition (2.6) of ψ, we began with the Ansatz that ψ∆ should be a holomorphic modular form of weight 8 for Γ (2). Equivalently, it should be a linear combination of Θ 16 00 , Θ 12 00 Θ 4 01 , Θ 8 00 Θ 8 01 , Θ 4 00 Θ 12 01 , and Θ 16 01 . Imposing the constraint z 4 ψ(−1/z) + ψ(z + 1) = ψ(z) eliminates three degrees of freedom, which leaves just one degree of freedom, up to scaling. The remaining constraint is that the coefficient of e −πiz in the Fourier expansion of ψ(z) must vanish, and then ψ is determined modulo scaling. Finally, we rewrote the formula for ψ to make it visibly positive.

The −1 eigenfunction uncertainty principle
This section is devoted to the proof of Theorem 1.4. We deal only with the −1 case, because all the assertions in this theorem were already proved in [12] for the +1 case. First, we reduce determining A − (d) to solving Problem 1.3. such that g = −g, g(0) = 0, and r(g) ≤ r(f )r( f ).
Proof. If f is not radial, then we average its rotations about the origin to obtain a radial function without increasing r(f ) or r( f ). Thus, we can assume that f is radial. Note that this process cannot lead to the zero function: if it did, then f and f would both have compact support and hence vanish identically.
The quantity r(f )r( f ) is unchanged if we replace f with x → f (λx) for some λ > 0. Thus, we can assume that r(f ) = r( f ). Letting g = f − f we deduce that g ∈ A − (d), g = −g, and r(g) ≤ r(f ). Again, g cannot vanish identically, because f and − f are eventually nonnegative and would thus both have to have compact support.

Lower and upper bounds.
To obtain a lower bound for A − (d), we follow [2,12]. Let g ∈ A − (d) \ {0} be a radial function satisfying g = −g and g(0) = 0, and assume without loss of generality that g 1 = 1. Let g + = max{g, 0} and g − = max{−g, 0}, so that g + , g − ≥ 0, these functions are never positive at the same point, and g = g + − g − . Since g(0) = 0, is a d-dimensional ball of radius r(g) and centered at the origin, because and we conclude that .

Now the function
is radial, belongs to A − (d), and satisfies g = −g and g(0) = 0. Hence

Estimates (3.2) and (3.4) imply that A − (d)/
√ d is bounded above and below by positive constants, as desired. In particular, the lower bound is 1/ √ 2πe, and the upper bound is at most 1 except for d = 1, in which case we can use A − (1) = 1 to obtain an upper bound of 1.
We believe that the upper bound (3.4) cannot be improved if we replace p with any polynomial of bounded degree, in the following sense. For N ≥ 3 and s = ±1, let A s,N (d) be the infimum of r(g) over all nonzero g : R d → R such that g = sg, g(0) = 0, and g is of the form where p is a polynomial of degree at most N . (The restriction to N ≥ 3 ensures that such a function exists.) However, the upper bound for A − (d) can be improved using other functions. In particular, we can make use of the auxiliary functions f constructed in [11] for the linear programming bound in high dimensions. If we set g = f − f , then one can show that r(g) ≤ (0.3194 . . . + o(1)) √ d as d → ∞. The number 0.3194 . . . is derived from the Kabatiansky-Levenshtein bound for sphere packing, and the construction in [11] shows how to obtain that bound via the linear programming bound. The precise number is rather complicated, but it can be characterized as follows. Let θ = 1.0995 . . . be the unique root of in the interval (0, π/2), and let c = sin(θ/2) cot(θ)e sec(θ)/2 √ 2π = 0.3194 . . . .
We do not know how to prove the corresponding bound for A + (d), although we believe it should be true, as it would follow from Conjecture 1.5.

3.2.
Existence of extremizers. The existence proof for extremizers with s = −1 is almost identical to the proof of the +1 case in [12,Section 6]. We briefly outline the proof here for completeness. Let f n ∈ A − (d) \ {0} be an extremizing sequence; that is, r(f n )r( f n ) A − (d) as n → ∞. By Lemma 3.1 we can assume that f n = −f n and f n (0) = 0, and hence r(f n ) A − (d). We can also assume that f n 1 = 1 for all n. In particular, since f n = −f n , we have Because the unit ball in L 2 (R d ) is weakly compact, we can assume that f n converges weakly to some function f ∈ L 2 (R d ). Because A − (d) is convex, we can apply Mazur's lemma to assume furthermore that f n converges almost everywhere and in L 2 (R d ) to f . Thus, necessarily we have f = −f and r(f ) ≤ A − (d). Since f n ∞ ≤ f n 1 = f n 1 = 1 and r(f n ) is decreasing, we can apply Fatou's lemma for g n = 1 B d r(f 1 ) + f n ≥ 0 to deduce that f ∈ L 1 (R d ) and f (0) ≤ 0. Hence, f (0) ≥ 0.
We now use Jaming's high-dimensional version [14] of Nazarov's uncertainty principle [15] to deduce, exactly as in [12,Lemma 23], that there exists K < 0 such that for all n, (Alternatively, we can use Proposition 2.6 from [1], which tells us less about the constant K but has a simpler proof.) Fatou's lemma implies that f satisfies the same estimate, and hence is not identically zero. We conclude that f ∈ A − (d), f = −f , and r(f ) ≤ A − (d), and thus r(f ) = A − (d). Finally, we must have f (0) = 0, since otherwise the proof of Lemma 3.1 would produce a better function.
3.3. Infinitely many roots. All that remains to prove is that the extremizers have infinitely many roots. The proof follows the ideas of [12, Section 6.2] for the +1 case. If f ∈ A − (d) satisfies f = −f and f (0) = 0 and vanishes at only finitely many radii beyond r(f ), then we find a perturbation function g ∈ A − (d) satisfying g = −g and g(0) = 0 such that r(f + εg) < r(f ) for small ε > 0; thus, f cannot be extremal. In [12], the construction of g varies between the cases d = 1 (using the Poincaré recurrence theorem) and d ≥ 2 (using a trick involving Laguerre polynomials). However, thanks to the Poisson summation formula, every extremal function f ∈ A − (1) with f = −f and f (0) = 0 must vanish at the integers. Thus, we only need to prove our assertion for d ≥ 2.
In fact, we will rule out the possibility that an extremizer f is eventually positive. Then applying this proof to the radialization of f will show that f must vanish on spheres of arbitrarily large radius. Thus, let f ∈ A − (d) be such that f = −f , f (0) = 0, and f (x) > 0 for |x| ≥ R. We must show that r(f ) > A − (d).

Numerical evidence
To explore how A + (d) behaves, we numerically optimized functions g : R d → R satisfying the conditions of Problem 1.1. Readers who wish to examine this data can obtain our numerical results from [6].
In our calculations we always choose g to be of the form g(x) = p(2π|x| 2 )e −π|x| 2 , where p is a polynomial in one variable of degree at most 4k + 2, which means p has 4k + 2 degrees of freedom modulo scaling. The constraint g(0) = 0 eliminates one degree of freedom, and one can check using the Laguerre eigenbasis that the constraint g = g eliminates 2k + 1 degrees of freedom. To control the remaining 2k degrees of freedom, we specify k double roots at radii ρ 1 < · · · < ρ k . We then attempt to choose the radii ρ 1 , . . . , ρ k so as to minimize r(g). To do so, we iteratively optimize the choice of radii for successive values of k, by making an initial guess based on the previous value of k and then improving the guess using multivariate Newton's method. Each choice of ρ 1 , . . . , ρ k proves an upper bound for A + (d), and we hope to approximate A + (d) closely as k grows. (Note that if Conjecture 3.2 holds, then we cannot obtain improved bounds if k remains bounded for large d.) This method was first applied by Cohn and Elkies [5, Section 7] to A − (d), with a simpler optimization algorithm. Cohn and Kumar [7] replaced that algorithm with Newton's method, and we made use of their implementation.
We have no guarantee that the numerical optimization will converge to even a local optimum for any given d and k, or that the resulting bounds will converge to A + (d) as k → ∞. Indeed, we quickly ran into problems when d ≤ 2, and eventually for d = 3 and 4 as well, but for 5 ≤ d ≤ 128 we arrived at the global  optimum for each k ≤ 64. These calculations are what initially led us to believe that A + (12) = √ 2. Our numerical calculations are generally not rigorous: although we believe we have used more than sufficient precision, we cannot bound the error from the use of floating-point arithmetic. However, we have used exact rational arithmetic to prove all the numerical upper bounds for A s (d) we report in this paper. 1 Thus, they are genuine theorems, while our numerical assertions about summation formulas have not been rigorously proved. Table 4.1 shows our upper bounds for A + (d) for 1 ≤ d ≤ 32, together with A − (d − 4) for comparison (taken from [4]). The shift by 4 approximately aligns the columns, with the best case being A + (12) = A − (8) = √ 2. We have no conceptual explanation for this alignment, but it fits conveniently with the sign in Proposition 2.6, and it supports our conjecture that The convergence to this limit is slow enough that it is difficult to estimate the limit accurately from numerical data. For d ≤ 2 our numerical methods perform poorly, for the reasons described below. For d = 3 the bound for A + (d) in Table 4.1 is obtained using k = 27, and for d ≥ 4 we use k = 32. In particular, we deliberately use a smaller value of k than the limits of our computations for d ≥ 4, so that we can use data from larger k to estimate the rate of convergence. These computations suggest the following conjecture. 1 The non-sharp cases from Table 4.1 are straightforward to check rigorously, while the inequality A + (28) < 1.98540693489105 requires more work because it uses a higher-degree polynomial with more complicated coefficients. We have proved it using the techniques and code from Appendix A of [7].   Table 4.1 are sharp, except for an error of at most 1 in the last decimal digit shown.
In each case with d ≥ 3, we can use a summation formula to check that we have found the optimal bound for the given values of d and k; we explain how this is done in Section 5. However, we do not know how quickly the bounds converge as k → ∞, or whether they indeed converge to A s (d) at all. Our confidence in Conjecture 4.1 comes from comparing the bounds for 32 ≤ k ≤ 64 when d ≥ 5. They seem to have converged to this number of digits, but of course we cannot rule out convergence to the wrong limit.
The approximation A + (d) ≈ A − (d − 4) and equality A + (12) = A − (8) = √ 2 raise the question of whether the other exact values A − (1) = 1, A − (2) = (4/3) 1/4 (conjecturally), and A − (24) = 2 are also mirrored by A + . That turns out not to be the case: Table 4.1 strongly suggests that A + (5) > 1 and A + (6) > (4/3) 1/4 , and it proves that A + (28) < 2. The case of A + (28) is particularly disappointing, because it might have stood in the same relationship to A + (12) as the Leech lattice does to the E 8 root lattice. We have found no case other than d = 12 for which we can guess the exact value of A + (d).
Taking k = 128 shows that A + (28) < 1.98540693489105, and again we believe that all these digits agree with A + (28) except the last. This upper bound for A + (28) seems discouragingly complicated, but the underlying root locations display remarkable behavior, shown in Table 4.2. The table leads us to the following conjecture: There exists a radial Schwartz function g ∈ A + (28) \ {0} with g = g, g(0) = 0, and r(g) = A + (28), and whose nonzero roots are at radii 2j + o(1) as j → ∞, starting with j = 2.
This pattern is reminiscent of [10,Section 7], as well as the behavior of A ± (d) in other cases, but it is a particularly striking example. We expect that Conjecture 4.2 is true, but a weaker conjecture consistent with the data is that there exists some ε < 1 such that the squared radii are within ε of successive even integers.
For comparison, [8] constructs a function achieving A − (24) whose nonzero roots are exactly at √ 2j with j ≥ 2. Our best guess is that the function achieving A + (28) is given by a primary term that has these exact roots, plus one or more secondary terms that perturb the roots but do not substantially change them. If that is the case, then perhaps one can describe this function explicitly and thereby characterize A + (28) exactly. However, we have not been able to guess or derive such a formula.
Another mystery is the behavior of A + (d) for d ≤ 2. In these dimensions we quickly run into cases in which the last sign change r(g) is not a continuous function of ρ 1 , . . . , ρ k at the optimum, and this lack of continuity ruins our numerical algorithms. (Instead, we resort to linear programming, which is much slower.) Of course it is no surprise that the last sign change is discontinuous at some points, because a small perturbation of a polynomial can convert a double root to two single roots, or even create a new root if the degree increases. However, we do not expect this behavior to occur generically. In particular, it cannot occur if deg(p) = 4k + 2 and g has no double roots beyond the k double roots we have forced to occur.
When d = 2, even the case k = 1 is problematic. Specifically, one can check that the optimal value r(g) = 2/π is achieved by setting ρ 1 = 3/π. As ρ 1 approaches 3/π from the left, r(g) decreases towards 2/π, but it increases towards infinity as ρ 1 approaches 3/π from the right. This discontinuity occurs because the leading coefficient of the polynomial p vanishes when ρ 1 = 3/π. The leading coefficient also vanishes at the best choices of ρ 1 , . . . , ρ k we have found for 2 ≤ k ≤ 4, while the case k = 5 suffers from a different problem: the resulting polynomial has six double roots, rather than just five, and again the location of the last sign change is discontinuous.
When d = 1, there are no problems for k ≤ 2, and the leading coefficient vanishes for k = 3. For k = 4, we find an extra double root, but there is no discontinuity when k = 5.
In Table 4.1 we have reported the bound using k = 5 for d ≤ 2. We believe that we have approximated the true optima for k = 5, but the bounds almost certainly do not agree with A + (d) to the full six digits shown, unlike Conjecture 4.1.
We have not observed a discontinuity near the optimum in any other dimension. However, when d = 3 we cannot find a local optimum with k = 28, because the largest root tends to infinity in our calculations. Computations carried out by David de Laat indicate that the optimum occurs at a singularity and the resulting discontinuity is interfering with our algorithms. When d = 4 we run into a similar problem at k = 36. We do not know whether this phenomenon is limited to d ≤ 4.

Summation formulas
We do not know how to obtain the hypothetical summation formulas described in Conjecture 2.5. Aside from A − (2) and the four cases that have been solved exactly (namely A − (1), A − (8), A + (12), and A − (24)), we have not found any summation formulas that come close to matching our upper bounds. However, in many cases we can compute optimal summation formulas for polynomials of a fixed degree. For d ≥ 3, these formulas show that we have found the optimal polynomials for each fixed k in our computations in Section 4, and we believe that when k is large they should approximate the ultimate summation formulas. For example, Table 2.1 is based on calculations with k = 128.
(Unlike earlier, we require only x ≥ R in the definition of r(p), rather than |x| ≥ R, because we care only about the right half-line.) To construct p, we impose double roots at locations ρ 1 , . . . , ρ k , and then choose these locations so as to minimize ρ 0 := r(p). Note that in our notation here, ρ i denotes what would have been called 2πρ 2 i in Section 4.
We assume furthermore that p has roots of order exactly 1 at ρ 0 and exactly 2 at ρ 1 , . . . , ρ k , and no other real roots greater than ρ 0 . Finally, we assume that we have found a strict local minimum for r(p); in other words, r(p) increases if we perturb ρ 1 , . . . , ρ k .
These assumptions cannot always be satisfied. For example, when (s, d, k) = (1, 2, 1) the coefficient of q 2k+1 vanishes. However, for d > 2 they are satisfied in every case in which we have found a local minimum. See Table 5.1 for a list. We prove this proposition below. It is a polynomial analogue of the summation formula (2.4) (with the Gaussian factors from the Laguerre eigenbasis implicitly incorporated into the coefficients c i ), and it is reminiscent of Gauss-Jacobi quadrature in that it holds on a (2k + 2)-dimensional space despite using only k + 2 coefficients.
In other words, although we have assumed only a strict local minimum for the last sign change among polynomials with k double roots, we have found the global minimum among polynomials with no such restriction. For example, when s = 1 and k = 64, we find that p is the best possible polynomial of degree at most 4k + 2 = 258. This phenomenon not only certifies our numerics by establishing matching lower bounds, but also helps explain why our algorithms perform well: degeneracy is the only way to get stuck in a local optimum.
The proof of Proposition 5.1 involves carefully studying how different quantities behave as functions of ρ 1 , . . . , ρ k . We can set up simultaneous linear equations to determine the coefficients of q 0 , . . . , q 2k+1 as follows. Write α = (α j ) 0≤j≤2k+1 for the column vector of coefficients (all vectors will be column vectors unless otherwise specified, sometimes indexed starting with 0 and sometimes with 1), and define the entries of the matrix M = (M i,j ) 0≤i,j≤2k+1 as follows: for i = 2k + 1.
We have assumed that M (ρ) is invertible, which means that α( ρ) and p ρ are smooth functions of ρ defined on some neighborhood of ρ. Because p ρ has a single root at ρ 0 , p ρ has a single root at some smooth function ρ 0 of ρ 1 , . . . , ρ k with ρ 0 (ρ) = ρ 0 , by the implicit function theorem. We will always assume that ρ is in a small enough neighborhood of ρ for this to be true. Furthermore, our assumptions so far imply that r(p ρ ) = ρ 0 for ρ in some neighborhood of ρ, and again we restrict our attention to such a neighborhood.
Because of our assumption of local minimality, the function ρ 0 must have a stationary point at ρ. In other words, ∂ ρ 0 ∂ ρ i (ρ) = 0 for 1 ≤ i ≤ k. In addition, ρ 0 > ρ 0 for ρ = ρ in some small neighborhood of ρ by strict local minimality. Once again we confine ρ to such a neighborhood.
Proof. The vector α has α 2k+1 = 1, while all the partial derivatives ∂α/∂ ρ i vanish in that coordinate. Thus, it will suffice to show that the partial derivatives are linearly independent at ρ, and because M is invertible, we can examine M (∂α/∂ ρ i ) instead of ∂α/∂ ρ i .
This lemma differs from Proposition 5.1 in not asserting uniqueness or sign conditions for c 0 , . . . , c k+1 .
It will suffice to find k + 1 linearly independent vectors in the kernel of left multiplication by T , because (2k + 2) − (k + 1) < k + 2. Those vectors will be α(ρ) and (∂α/∂ ρ i )(ρ) for 1 ≤ i ≤ k, which are linearly independent by Lemma 5.3. All that remains is to prove that they are in the kernel of T .
We have therefore found k + 1 linearly independent vectors in the kernel of left multiplication by T , as desired.
Proof of Proposition 5.1. By Lemma 5.4, a summation formula exists, and all that remains is to prove uniqueness and the sign conditions. Because M (ρ) is nonsingular, the values g(0) and g(ρ i ) with 1 ≤ i ≤ k can be chosen arbitrarily. Thus, the summation formula must be unique up to scaling, and the coefficient c 0 of ρ 0 cannot vanish. Now let 1 ≤ i ≤ k, and let ρ equal ρ except in the i-th coordinate, where ρ i = ρ i + ε with ε > 0 small. Then p ρ (ρ i ) and p ρ (ρ 0 ) have opposite signs because r(p ρ ) > r(p ρ ), while p ρ vanishes at the rest of ρ 1 , . . . , ρ k . It follows from taking g = p ρ that c i must be nonzero, with the same sign as c 0 .
Finally, when s = 1 we can compute the sign of c k+1 by taking g = q 0 = 1 to obtain k+1 i=0 c i = 0.
When s = −1, we conjecture that c k+1 always has the same sign as c 0 , . . . , c k . This conjecture holds for every case listed in Table 5.1.