Extremal Effective Field Theories

Effective field theories (EFT) parameterize the long-distance effects of short-distance dynamics whose details may or may not be known. It is known that EFT coefficients must obey certain positivity constraints if causality and unitarity are satisfied at all scales. We explore those constraints from the perspective of 2 to 2 scattering amplitudes of a light real scalar field, using semi-definite programming to carve out the space of allowed EFT coefficients for a given mass threshold M. We point out that all EFT parameters are bounded both below and above, effectively showing that dimensional analysis scaling is a consequence of causality. This includes the coefficients of four- and six-derivative interactions. We present simple extremal amplitudes which realize, or"rule in", kinks in coefficient space and whose convex hull span a large fraction of the allowed space.


Introduction
The notion that degrees of freedom at different length scales decouple from each other is a cornerstone of modern physics. In this note, we consider situations where details of the short-distance physics are unknown, but one is interested in its long-distance effects as parameterized by effective field theory (EFT) coefficients. In relativistic quantum theories, it is known that "not anything goes": if the short-distance physics part is compatible with causality and unitarity, the low-energy parameters will obey certain inequalities, discussed notably in [1]. In this paper we explore such inequalities in an effort to carve out the allowed space of local and unitary EFTs. We will consider asymptotically flat space-time, where the S-matrix encodes long-distance or low-energy observables. We will specifically study a subset of EFT parameters, denoted g k , captured by 2 → 2 scattering. As will be reviewed below, causality and unitarity imply dispersive sum rules: ds (· · · ) k ρ J (s) (1.1) where the spectral density ρ J (s) (proportional to the imaginary part of the amplitude) is related to the probability of a high-energy state with angular momentum J to scatter at energy √ s > M , and the kernels (· · · ) k are given explicitly below and depend on the particular EFT coefficient g k of interest. The mass M separates "light" and "heavy" states and can be interpreted as the EFT cutoff in an appropriate scheme. We will be agnostic about the highenergy sector: our only input will be its compatibility with unitarity and crossing symmetry. Unitarity, the statement that probabilities should lie between 0 and 1, will simply mean: (1.2) Not all spectral densities that satisfy this inequality are reasonable candidates for the imaginary part of a scattering amplitude, however. This is because Kramers-Kronig type dispersion relations can reconstruct amplitudes from ρ J alone, but there is no guarantee that the outcome satisfies the full crossing symmetry. Crossing-symmetric ρ J (s)'s are orthogonal to an infinite set of "null constraints", which will be a key ingredient of this paper. For our purposes, classifying causal and unitary EFTs amounts to finding the image, under the map (1.1), of the set of unitary and crossing-symmetric ρ J (s).
Causality constraints in quantum field theory have been discussed since the inception of the subject. Many studies were motivated by the phenomenology of the strong force [2]. To give just a few examples, dispersion relations and sum rules were used in the analysis of low-energy pion scattering [3][4][5], and inequalities satisfied by EFT parameters were obtained using properties of forward amplitudes in [6,7]. This work aims to explore inequalities on EFT parameters systematically.
We focus on the simplest example: a single (non-gravitating) real scalar field. Since we view the EFT cutoff M as much larger than the mass of the light scattered particles, we take the latter to be massless. On grounds of dimensional analysis, one expects the coefficient of a (k + d)-dimensional operator in the low-energy effective Lagrangian to scale like ∼ 1/M k , possibly further suppressed by a small coupling, but never larger. This scaling is clearly realized when one integrates out a massive field. The main question to be addressed is: Can dimensional analysis scaling be justified by rigorous numerical bounds? Can "accidentally large" EFT coefficients be ruled out ? We will find that the answer is positive, and we present a general framework to numerically obtain the optimal bounds. Furthermore, we will show that much of the shape of the allowed space, including two kinks, can be understood from simple analytic scattering amplitudes.
The relation between dimensional analysis scaling and causality resonates with many previous studies, for example [8][9][10][11][12]. Our new observation will be the seemingly universal existence of two-sided bounds.
This paper is organized as follows. In section 2 we review the general principles satisfied by scattering amplitudes, introducing a family of "B k " sum rules expressing EFT coefficients as averages over high-energy probabilities. In section 3, we provide a general numerical optimization strategy to rule-out candidate EFTs by making use of the averaging technology. In section 4, numerical results are presented along with remarks. Section 5 bridges the numerics with the analytic results. We conclude in section 6 with a discussion about the potential use cases of the numerical framework presented and the further implications of the numerical results.
Note added: When this manuscript was being completed, the works [13] and then [14] appeared with partial overlap in the results. The second paper in particular gave a two-sided bound on the stu interaction which agrees with our eq. (3.6). Further comparisons will be interesting.
2 Preliminaries: Scattering amplitudes and dispersion relations 2.1 Low energy: effective field theory We consider 2 → 2 scattering of massless identical real scalars in a Poincaré invariant theory ( fig. 1). Treating all momenta as incoming, the amplitude is a function of Mandelstam invariants: which satisfy s + t + u = 0. By crossing symmetry, it is invariant under all permutations (this holds with appropriate i0's in the discontinuity, as further discussed below): Our first step is to parameterize the amplitude at low energies in terms of a specific effective field theory. Generally, the form of the amplitude depends on the couplings of the theory. It becomes particularly simple if the theory is weakly coupled and we restrict ourselves to the tree approximation. We thus use the tree approximation here and until subsection 2.4 In this case, the amplitude has no low-energy branch-cuts, so the EFT expansion is simply a series in small s, t, u: + g 6 (s 2 + t 2 + u 2 ) 3 + g 6 (stu) 2 + g 7 (s 2 + t 2 + u 2 ) 2 (stu) + · · · (2.3) The first line accounts for φ 3 and φ 4 relevant interactions, while the remaining terms simply list the most general symmetric polynomials in s, t, u, to account for higher-dimension operators in the EFT. The subscript denotes the degree in Mandelstam invariants. Symmetric polynomials are easy to enumerate since their ring is freely generated by two elements: s 2 + t 2 + u 2 and stu (given that s + t + u = 0).
A short exercise shows that the preceding amplitude is obtained from the following effective Lagrangian in the tree approximation: As is well known, Lagrangian densities are not unique: they are defined modulo integrationby-parts and field redefinitions. One can cast any effective Lagrangian for a real scalar field into the form (2.4) by using field redefinitions to eliminate, order by order in the derivative expansion, corrections to the kinetic and cubic terms as well as appearances of ∂ 2 φ. See for example [15] for a discussion in the Standard Model context. The amplitude (2.3) is a physical observable unaffected by such ambiguities, which is why we choose to parameterize the coefficients in terms of it. Our goal is to constrain the EFT parameters g k assuming existence of an high-energy completion which is causal and unitary, but not necessarily weakly coupled. Low-energy interactions involving five or more powers of φ will not be constrained by our methods, since they are not detected by (tree-level) 2 → 2 scattering. When low-energy loop corrections are included, the detailed form of eq. (2.3) will be modified, but we do not expect the number of independent EFT parameters that we can constrain to increase. A precise definition of the g k 's that remains valid in the presence of low-energy loop corrections is given in eq. (2.18) below.

High energy: partial wave decomposition
At high energies, we will be agnostic about the form of the amplitude except for the assumption that it is causal and unitary. We follow the general framework of S-matrix theory, as reviewed for example in [2]. Let us begin with unitarity of the S-matrix, which is formally that: S † S = 1, where S = 1 + iM. More precisely, one picks a physical region, say where s > 0 is interpreted as center-of-mass energy squared, and −s < t < 0 gives the momentum transfer (squared); the scattering angle is The scattering operator is a convolution with respect to angles, which is diagonalized by going to a basis of angular momentum partial waves. The unitarity condition is thus simplest to state in this basis (our conventions follow [16]): where d is the space-time dimensions and P J (x) are the d-dimensional version of Legendre polynomials (which appear in the d = 4 case): Details of the specific scattering process are encoded in the coefficients f J (s). In this normalization convention, unitarity of the elastic amplitude of identical real particles is [16]: The elastic amplitude S J can have absolute value less than unity due to inelastic processes. We will only need the imaginary part of the high-energy amplitude. Defining the spectral density ρ J (s) = s d−4 2 Im f J (s), it can thus be written as where the unitarity constraint is The normalization is such that ρ J = 1 for complete absorption (S J = 0), and ρ J = 2 for an elastic phase shift π. For the most part (except for subsection 3.5) we will only use the first inequality: 0 ≤ ρ J (s).

Dispersion relations
The other key ingredient from S-matrix theory is the connection between low and high energies, which stems from analyticity. More precisely, we will use the following two properties of the amplitude: 1. For fixed t < 0 and |s| sufficiently large, M(s, t) is analytic in s away from the real axis.
Physically, these conditions combine causality and unitarity. For an elementary explanation of their respective significance, we refer to the signal propagation model in appendix D of [8], where it is explained that propagation of a signal through a black box is causal if and only if the corresponding transfer function S(ω) is analytic in the upper-half frequency plane (with sub-exponential growth), and that |S(ω)| 2 ≤ 1 throughout the upper-half-plane if the box furthermore preserves the squared-norm of signals. These are general facts about Fourier transforms. The original physical derivation of crossing symmetry [17] applies these facts to the expectation value of a retarded commutator in one-particle state. Schematically, one considers which vanishes outside the forward light-cone rendering its Fourier transform analytic in the upper-half s-plane (intuitively, one uses that s is linear in right-moving light-cone momentum), at least for large enough |s|. Its boundary values on the real axis unite the s-channel amplitude and the complex conjugate of the u-channel amplitude, see fig. 2. This is the traditional understanding of crossing symmetry within the axiomatic theory [18]. The boundedness property, in particular including the extra factor of 1/s 2 compared with the signal model, will be critical for us. We believe it can be justified physically by directly analyzing the transverse Fourier transform [19]. As far as we understand, properties 1-2 are theorems in axiomatic quantum field theory, for example in the context of pion scattering [20,21]; we take them as axioms embodying causality and unitarity.
The two properties assumed above amount to the existence of twice-subtracted dispersion relations. Let us derive such dispersion relations explicitly. The starting point is that an integral over a large circle vanishes: where s 1 and s 2 are arbitrary subtraction points. For large enough s , the integrand behaves like ∼ M(s , t)/s 3 , and so the integral vanishes thanks to property 2. Typically, one would formally treat all of s, s 1 and s 2 as non-real and deform the contour toward the real axis. Summing the three explicit poles and cuts then relates M(s, t) to its value at two subtraction points plus an integral over the discontinuity of M across the real axis [22]. For our purposes it is convenient to instead treat the subtraction poles as part of the real-axis cuts. We choose s 1 = 0 and s 2 = −t to maintain the symmetry between the s and Figure 2: Analyticity in the upper-half-plane relates the s-channel amplitude and the complex conjugate (anti-time-ordered) u-channel amplitude. Note that the s-and u-channel cuts overlap in the physical region where t < 0, which is not a problem since the crossing path avoids small |s|. Figure 3: Contour deformation which gives the sum rule (2.14) when low-energy loops are neglected: the integral over arcs at infinity vanishes, thus relating low-energy data and heavy cuts.
u channels without introducing any new energy scale into the problem. The identity (2.12) then relates the residue at s = s with a discontinuity: where we have written the discontinuity as an imaginary since the amplitude on the "wrong side of the cut" is its complex conjugate; technically Im f (s) . We call eq. (2.13) a twice-subtracted dispersion relation because of the two powers of s added to the denominator.
Let us see how this works in the simplest situation considered in eq. (2.3), where lowenergy loops are neglected. Then branch cuts can only start at the UV cutoff M 2 . The right-hand-side of eq. (2.13) then contains low-energy poles at s = 0 and s = −t (due both to the denominator in eq. (2.13) and poles in the amplitude), and high-energy cuts at s > M 2 and s < −M 2 − t. Separating low and high energies gives a relation: (2.14) We used s ↔ u symmetry to combine the left and right cuts. This relation is supposed to converge for any s, t with u < 0. Interestingly, plugging in the EFT expansion eq. (2.3) for M low , one finds that both the spin-0 exchange diagram g 2 and spin-0 contact interaction λ cancel out, and what remains is pole-free (this could have been anticipated from the fact that the three residues on the left combine into a single contour over a large circle). On the right-hand-side we insert the partial wave expansion (2.9). It is useful to define heavy averages: Eq. (2.14) becomes, for t < 0: . (2.16) The averaging symbol denotes a (non-normalized) positive sum over heavy states with mass m > M . All the results in this paper follow from Taylor-expanding both sides in s and t and using positivity of the measure · · · . It will be useful (though non-essential) to re-organize a bit.

The B k (t) family of sum rules
It is easy to see that the Taylor expansion of both sides of eq. (2.16) maintains the symmetry under s → −s − t, and therefore only even powers of s carry information. More precisely, for each even integer k the coefficient of [s(s + t)] k/2−1 gives a one-parameter family of sum rules parameterized by t, which we call B k (t). It can be computed by taking the s → 0 limit in eq. (2.12): This is similar to moment sum rules ∞ ds s k+1 M(s, t) which have been used since times immemorial. Here we have simply re-organized using the s ↔ u symmetry of our problem to eliminate odd moments. 1 The subscript indicates that B k enjoys the high-energy convergence of a k-subtracted dispersion relation. 1 The identity: ∞ ds 2πi 1 s s(s + t) (k−k )/2 = δ k,k shows that eq. (2.17) indeed extracts the coefficient of Figure 4: Integration contour to be used when low-energy loops are included; the integral vanishes as it is equivalent to arcs at infinity. This relates high-energy cuts at s > M 2 and u > M 2 with EFT-computable data near the EFT cutoff |s| ∼ M 2 .
A closely related basis of sum rules was introduced recently for conformal field theory correlators [23] (see also [24]). For holographic theories, the Mellin-space form of the sum rule, called B k,t (see eq. (4.54) and section 4.8 there), precisely reduces in the flat space limit to our current B k (t). In this context, convergence for k ≥ 2 is a consequence of the known boundedness of conformal correlators in the Regge limit.
For massless scattering, the low-energy s-and u-channel cuts of M(s, t) generally overlap as shown in fig. 2. It is important that eq. (2.17) can be computed without going between the cuts. We simply deform the contour to pick heavy branch cuts at s > M 2 and u > M 2 , and keep the rest as large arcs with |s| ∼ M 2 , the EFT cutoff, see fig. 4. This gives a relation between physics at the scale M and that at higher-energies: (2.18) This equation is valid even when EFT loops are included. The idea is to choose the EFT cutoff M such that loop corrections in the low-energy EFT are under control over the arcs with |s| ≈ M 2 . Eq. (2.18) thus equates an EFT-computable left-hand-side, with a high-energy average that enjoys positivity properties.
The specific relation between the left-hand-side and EFT coefficients will depend on EFT interactions. For concreteness let us thus focus again on the case where EFT loops are neglected. The left-hand-side is then just the sum of residues at s = 0 and s = −t; By symmetry, we can replace 1 s by 1 s − 1 s+t and include a single pole, and the B k sum rules becomes: (2.19) This simplification of eq. (2.18) is only valid when neglecting EFT loops.
Let us record the first few two instances explicitly: The left-hand side has a regular series in t, and the right-hand side involves Gegenbauers P J (1 + 2t m 2 ), which can be straightforwardly expanded at small t M 2 using eq. (2.7) . Recall that averages are taken over heavy states with m ≥ M . Matching both sides order by order in t generates a linear system in g n 's: We introduced the spin Casimir J 2 = J(J + d − 3) for convenience. Note that we truncated M low to order g 4 , but it is possible to work to higher orders and generate linear relations on couplings such as g 5 and so on.
The averaging notation immediately shows that g 2 , g 4 > 0 since they are high-energy averages of positive quantities 1 m 4 and 1 2m 8 , respectively. Furthermore, the inequalities g 3 ≤ 3g 2 M 2 and g 4 ≤ g 2 2M 4 also follow readily since m ≥ M inside the average. In contrast, the sign of g 3 is not immediate due to the presence of spinning particles -the magnitude of J 2 requires a deeper investigation. This difficulty was noted in attempted proofs of the six-dimensional a-theorem [25].
The key to calculating a lower bound for g 3 will be the existence of two distinct averages that output g 4 . Equating them yields the first example of what turns out to be an infinite set of null constraints: This is a constraint on the probabilities ρ J (s) which define the average · . The subscript indicates the degree in 1/m 2 . Physically this stems from crossing symmetry -since there is a unique symmetric polynomial at degree 4, the coefficients of s 2 t 2 and s 4 must be related. There are no lower-degree examples of this phenomenon: monomials with fewer than two powers of s are killed by any double-subtracted sum rule, and odd powers of s are informationfree since fixed-t dispersion relations preserve the s ↔ −s − t symmetry of our problem. Null constraints such as eq. (2.23) will be central to this work. They balance spin-two states against higher spin states: as visible from fig. 5, the average vanishes for spin 0, is negative for spin 2, and positive for all other spins. This implies that, as soon as one particle of spin 2 is present, higher-spin particles must also be present, with predictable properties.
(Spin two particles are singled out by the physical assumption that double-subtracted sum rule converges.)

Optimization framework
The B k sum rules just introduced, coupled with positivity of high-energy averages · provide a complete apparatus to establish potent self-consistency conditions on EFT coefficients g k 's (defined in eq. (2.3)). We recall our physical assumptions: • Double-subtracted dispersion relations converge • The low-energy amplitude is crossing symmetric • The high-energy spectral density is positive Since we are considering averages over heavy states (with m > M ), the coefficients (except in subsection 3.5) are naturally normalized by g 2 and the EFT cutoff M . We will therefore be bounding dimensionless ratios: Optimal bounds on theseg k 's will be found by formulating a dual problem, in which we combine the desired averages (such as 2.18) with null constraints (such as eq. (2.23)) to obtain sign-definite sum rules. We first describe a simple example analytically, then describe a systematic implementation as a semi-definite problem amenable to publicly available software like SDPB [26].

Warm-up problem with three sum rules
As a warm-up, let us ask whether it is possible to lower-bound theg 3 coefficient using the B 2 , B 4 sum rules previously calculated. We consider the corresponding system of three equations from (2.22) (including the null constraint obtained via g 4 data): With these definitions, let us examine a similar, but simpler set of relations: What makes a finite lower bound plausible is that the null constraint (the third equation) should somehow prevent large spins from contributing too much. This is an important point: the allowed range forg 3 is restricted by higher derivative crossing equations! We now calculate a lower bound in two ways. The first -and simplest -method is to use the Cauchy-Schwarz inequality with the null constraint: Then, using the fact that m > M and J 2 ≥ 0 inside the average yields J 2 , and by dividing both sides by that average, we obtain an upper-bound on J 2 m 6 as desired: This has a simple physical interpretation: if we define an impact parameterb = 2J m , then we have effectively shown that heavy states can't contribute at impact parameters much larger than ∼ 1/M . In terms of the original problem (3.2), we have shown that This shows the existence of two-sided bounds for generic couplings. This is an important qualitative result, to our knowledge originally emphasized in [11]: ratios of EFT couplings, in units of the cutoff scale M , must be O(1) numbers. Numerically, however, the Cauchy-Schwarz method does not yield the optimal lower bound.
In contrast, the second -and more powerful -method is re-interpret the above task as a semi-definite problem, in order to systematically search for optimal bounds. Denote h i (m 2 , J) the function whose average gives h i . The idea is to construct positive-definite combinations of the three averages in eq. (3.3): where we must find α, β such that F (m 2 , J) ≥ 0 for all J = 0, 2, 4 . . . and m ≥ M . Taking the average of any such F then proves h 3 ≥ −αh 2 . The optimal bound will come from a non-negative F with minimal α. Let us first reproduce the first Cauchy-Schwarz argument in this language, which should give α = b M 2 . Assume β > 0. The argument amounts to completing squares in the J 4 /m 8 term: For any λ this is an identical rewriting of eq. (3.7), and the Cauchy-Schwartz-like method is to choose λ, β such that the other terms are positive as well. From the limit J → ∞, the terms with J 2 need to give a positive functions of m, which imposes that (2βλ − 1 − βb/M 2 ) ≥ 0.
To minimize α we must minimize βλ 2 ; we find that the minimum saturates the inequality, and is simply β = M 2 /b with βλ = 1. With this choice, our trial functional becomes The minimal α for which the first terms are positive for all m 2 is then α = b M 2 , precisely as anticipated! We have thus exhibited a positive functional F which proves the Cauchy-Schwarz bound in (3.5).
It is now easy to see why this bound is not optimal: F doesn't need to be expressible as a sum of three separately positive parts! In d = 4, for example, the above argument gives −16 ≤g 3 . In comparison, using the numerical search strategy detailed in the next subsections, we find that the following combination is positive for all J = 0, 2, 4 . . . and m ≥ M : This allows to infer, by taking the average of this inequality, that This is significantly stronger than −16 ≤g 3 that we just derived in an ad hoc manner. How can we understand the solution (3.10) analytically? They key ingredient will be that spins J are discrete, whereas our ad hoc bound treated J's as continuous parameter. We now calculate this bound analytically. Let us return to the functional ansatz (3.7) and try to directly constrain the unknowns α, β. Putting J = 0 we only deduce α ≥ 0. Putting J = 2, 4 . . . gives a sequence of quadratic polynomials in (m 2 − M 2 ), each with positive curvature. Such polynomials are non-negative if the two roots are negative, or if both roots are positive and equal, or if they are complex conjugate pairs. It seems virtually impossible to guess a priori which case is realized, however this information is readily gleaned by plotting the numerical polynomials (3.10), as shown in fig. 6. We see that the J = 2, 4 inequalities are both saturated: the former by having a root at m 2 = M 2 , and the latter by having a positive double root (i.e. vanishing discriminant).
These two saturated inequalities give algebraic equations that may be solved analytically; this determines the vector (α * , 1, β * ) in eq. (3.10) to be:

Dual problem: general formulation
We now introduce the general "dual" optimization problem which allows to carve out the space of EFT coefficientsg 3 ,g 4 ,g 5 , . . . allowed by unitarity and positivity.
The data at our disposal comes from the B k sum-rules in eq. (2.19): 1. Representative averages g k (m 2 , J) which measure each "desired" g k : allowing the desired optimal bound ong 3 to be inferred: A ≤g 3 . It then follows thatg 3 ≤ B. Taking the intersection of these two sets yields a convex region defined by These are the inequalities inferred ong 3 from data in the B k sum-rules. In theory, the null constraints n i are part of an infinite dimensional vector space, but for numerical purposes, the dimensionality is taken to be finite and determined by the truncation order of the low-energy expansion M low . We will find that the optimal bounds converge rapidly as the maximal degree is increased.
This problem can be readily adapted if we are given additional assumptions. For example, to make exclusion plots in the (g 3 ,g 4 ) plane, one strategy is to postulate some value ofg 3 , for example. In fact, since the EFT parameters enter linearly, the resulting inequality automatically carves out a half-space in the (g 3 ,g 4 )-plane: This half-plane is tangent to the allowed region atg 3 =g 3 . If this process is repeated for distinctg (0) This processes generalizes to higher dimensional planes (i.e. hyperplanes): the vector v always contains only those coefficients we are not being agnostic about, plus an arbitrary number of null constraints. Although we will not go beyond three-dimensional regions, we note that an efficient search algorithm in higher dimensions is described in ref. [27].

Example with a SDPB implementation
The optimization problem just formulated is in a form that is directly amenable to the SDPB solver [26]. There are just two simple substitutions to make: 1. The program accepts polynomials of x ≥ 0; we set m 2 → M 2 (1 + x) and remove a common denominator.
2. The program accepts finite lists of polynomial constraints. We tabulate a finite list of spins J = 0, 2, . . . , J max and add a single function of x corresponding to J → ∞.
The second truncation is valid as long as J max is taken sufficiently large; once convergence is achieved, further increases of J max have no effect on the bounds. We consider now an example relevant to one of the plots in the next section, when working in d = 4, J max = 40 and Mandelstam order n = 4. The goal is to find a lower bounding plane ong 4 for fixedg 3 , sayg 3 = −10.5 (which is slightly above the allowed lower bound In addition to tabulating v for all even J ≤ J max , we also include the infinite-J limit, which is simply the coefficient of J 4 : v(x, ∞) = (0, 0, 0, 2) .

(3.25)
To lower-boundg 4 at the stated value ofg 3 , we search for four-vectors y, normalized to y· 0, 0, 1, 0 = 1, which solve the following problem: The lower bound is theng The solution vector y computed by SDPB was found to be y ≈ (1.4823, 0.1810, 1, 0.01472), giving h = 0.4183. The half-plane allowed by positivity of y·v is thus g 4 + 0.1810(g 3 + 10.5) ≥ 0.4183 (3.28) which gives one of the boundaries used to make the n = 4 region in fig. 10 (b) below.

Generating null constraints
So far we used a single constraint from crossing: the null average n 4 (m 2 , J) from eq. (2.23).
The bounds improve after we add more constraints. Let us describe a way to generate them. A straightforward method is as follows. First, fix a degree in Mandelstam invariants, n. Then, list all the n 2 low-energy averages corresponding to this degree, namely the coefficient of t n−k in the left-hand-side of the B k (t) sum rules (2.19) with k ≤ n, using the crossingsymmetric low-energy ansatz (2.3). The right-hand-sides of those linear combinations with vanishing left-hand-side then constitute a basis of null constraints.
The first few cases, up to degree n = 7, are (in arbitrary normalization): . Note that all null averages vanish when J = 0: they relate spinning heavy states to one another, but spinless states are completely decoupled.
We find that there exists a single null constraint for each degree n = 4, 5, 6, then two constraint each for n = 7, 8, 9, three each for n = 10, 11, 12, etc.: the number of linearly independent null constraints at each degree increases by 1 for every increase of n by 3. A sequence of generating functions X k (t) which enumerates them all is discussed in appendix A.
Finally, it is important to stress that the null constraints only average to zero modulo EFT loops, since the method for finding them relies on the explicit tree-level parameterization (2.3). The interpretation of resulting inequalities as bounds on g k is thus only strictly valid in this approximation. In an interacting EFT, the coefficients g k depend on choices of scale and renormalization scheme, and the correct interpretation of the positive functionals F is that they give rigorous (possibly non-optimal) inequalities of the form: where all averages are computed as integrals over arcs with |s| ≈ M 2 following eq. (2.18). The method thus produces rigorous bounds on computable combinations of EFT couplings at the scale M .

An ad hoc upper bound on (∂φ) 4
The systematic method explained above bounds ratios g k /g 2 , but how about g 2 itself? Here we present one upper bound on g 2 ; this subsection is somewhat separate from the rest since we were unable to systematically optimize the bound.
The naive intuition is that if heavy states couplings are order unity, then g 2 ∼ 1 M d . An upper bound with this scaling should thus follow from the unitarity limit, ρ J ≤ 2. This would be the full story if heavy states only had a finite number of spins, however, to get an actual bound one must also control the infinite sum over spins. The idea is to combine the sum rule for g 2 with a multiple of the first null constraint n 4 : This holds for any α; we recall that J 2 = J(J + d − 3) is the spin Casimir. Inserting the definition (2.15) of the average and switching to a dimensionless mass parameter z = M 2 /m 2 < 1, this can be rewritten This is a good step to get an upper bound on g 2 since for α > 0 the integrand is mostly negative at large spin J, except for a small region with z small. We use unitarity in two steps: first, we use positivity ρ J ≥ 0 to restrict the integral to the small-z region where the parenthesis is positive, where we can then use ρ J ≤ 2. Thus: where z max (J) = min([αf (J)] − 1 2 , 1) determines the region where the parenthesis is positive. The important point is that at large spin this region shrinks, which will ensure convergence of the J sum. Effectively the region is bounded by impact parameter Mb ∼ < α −1/4 . At small spins the full range is generally accessible. Letting z max (J) = 1 for J ≤ J * (α), the sum splits as Both sums run only over even spins. Since n (d) J ∝ J d−3 at large spin, the sum to infinity converges. This inequality is valid for any α > 0; with increasing α, the J = 0, 2 terms tend to increase whereas the rest decreases: the optimal bound with this method is obtained by minimizing over α. The dependence is rather non-linear (which is why we were not able to generalize the method to include more null constraints), but evaluating the sum numerically we find (in all dimensions we tried) that the optimum occurs with J * = 2. In d = 4, for example, the optimal value α * = 0.025, giving the analytic bound: 4) .  Table 1: Nonperturbative (possibly non-optimal) upper bounds on As expected, up to a standard loop factor, the coefficient of 1 2 (∂φ) 4 can't exceed order unity in units of the heavy scale. We stress that, contrary to other bounds in this paper, upper bounds on couplings at the cutoff scale cannot be straightforwardly interpreted in terms of Lagrangian parameters, since any EFT which saturates them is by definition strongly interacting already below the cutoff, making quantum corrections non-negligible. Rather, the bound may be interpreted as follows: among all observables which are linear in the S-matrix at the scale M and which reduce to g 2 at weak coupling, there exists one which satisfies eq.

Numerically ruling-out: The allowed space of scalar EFTs
In this section, we summarize the obtained numerical results, focusing on d = 3 + 1 spacetime dimensions. Treating the low-energy EFT to tree-level, we determine the space of EFT coefficients g n , where the low-energy amplitude is parameterized as Recall that it is convenient to introduce dimensionless EFT coefficientsg n normalized by g 2 and appropriate powers of the mass gap M introduced in equation (3.1), since the numerical analysis is performed directly on these variables. We find optimal upper and lower bounds forg 3 ,g 4 , andg 5 , given positivity of high-energy spectral densities ρ J , using the optimization framework introduced above.

Bounds on individual coefficients
Let us begin by confining the value of individual EFT coefficients, being completely agnostic about all the others.  Table 2: Convergence of the "box bounds" (4.2) with increasing Mandelstam order n , which results in larger sets of orthogonal constraints N = {n 4 (m 2 , J), · · · }. At order n = 3, all lower bounds would be −∞ since no null constraints exist at that order.
The simple, rational upper bounds are saturated by the spin-0 contribution to sum rules like eq. (2.22). While keeping constraints with higher Mandelstam degree n is feasible, runtime-wise, we found that the large-spin convergence was harder to control as we needed J max = O(1000). However, for n ≤ 16, convergence is easily obtained for smaller spins. This is the reason why we stopped the table at n = 16; it would be interesting to understand how to stabilize the numerics at large n.
Theg 3 lower bound is plotted as a function of 1/dim N in fig. 7, where dim N is the dimension of the number of null constraints accessed. The dimension of this vector space is naturally monotonically increasing with the Mandelstam order n. Ideally, one would like to extrapolate the bounds to n → ∞, however the approach is somewhat irregular and we didn't find a compelling fit function. Therefore, the recorded bounds are simply taken from our largest reliable value of n.
In d = 6 and d = 10, the analogous bounds take the values: g n obtained with n = 10 (corresponding to 12 linearly independent null constraints) are presented in appendix B.

Two-dimensional allowed region in (g 3 ,g 4 ) plane
The "box bounds" in eq. (4.2) tell an incomplete story since they miss potential correlations between the coefficients. Exclusion plots on the (g 3 ,g 4 ) plane can be obtained following the constrained optimization framework presented at the end of section 3.2. Specifically, upper and lower bounding planes ong 4 were obtained by samplingg 3 at various points in the range allowed by eq. (4.2). Repeating this process for a large number of sampling points generates a collection of linear inequalities, whose intersection defines a refined allowed region. Figure  8 depicts this region with over 100 sampling points alongg 3 .
Further insight into the shape of the region can be obtained by noticing that the crossing symmetry constraints do not mix particles of spin 0 with the others, as noted in eq. (3.29). This indicates that heavy spin-0 particles satisfy crossing on their own, as will be further discussed in the next section. For numerical purposes, this decoupling allows to refine the problem: any high-energy spectrum can be written as a positive sum of its spin-0 content, plus a unitary solution to crossing that only contains particles of spin J ≥ 2. The full allowed region is then simply the convex hull of the allowed regions for these two problems: Entire region = Convex Hull [Spin-0 + Spin-J ≥ 2] . (4.5) As may be seen from the form of the g 3 sum rule (2.22), the two solutions are differentiated by the sign of g 3 : positive for Spin-0 and negative for Spin-J ≥ 2. In our implementation of the dual problem, theories with only J ≥ 2 particles can be studied by simply dropping the positivity constraint for the functional action on J = 0. The allowed regions for the Spin-0 and Spin-J ≥ 2 sub-problems are the narrow almond-shaped regions shown in fig. 9.
The shape of these regions is largely explained by a simple scaling argument: given any solution to crossing, scaling-up its overall mass scale will give a new solution. Starting from any allowed point (g 3 ,g 4 ), this generates an allowed path (αg 3 , α 2g 4 ) where 0 ≤ α ≤ 1. This explains the parabolic shape of the "underbellies" in fig. 9. In fact the Spin-0 almond is simply the convex hull of the parabola connecting (0, 0) to (3, 1 2 ). (This is qualitatively similar to what is found in the forward limit [11,13].) The Spin-J ≥ 2 region is more complicated -while it also displays a parabolic underbelly near the origin, it fails to extend all the way tog 4 = 1 2 . The boundary must thus exhibit non-analytic behaviour at the end of the parabola, however we were unable to local- Figure 9: The (g 3 ,g 4 ) allowed region, segmented into theories without spinless particles (J ≥ 2) and theories with only spinless particles (J = 0). The convex hull of these two regions reproduces fig. 8.
ize a kink that remains stable with varying the Mandelstam expansion order n, suggesting milder non-analyticity (such as a discontinuous second derivative). This qualitative feature is demonstrated by close-ups on the high-g 4 end of the allowed region are shown in fig. 10, which also indicates convergence with increasing n. The lineg 4 = 1 2 features two kinks: at (−10.19, 0.5) and (3, 0.5), as shown in figure 10. Below we will find analytic expressions for the scattering amplitude at these kinks! We chose the simplest and perhaps weakest method of calculating the exclusion region. Method of radials or normals have the potential more efficiently calculate these boundaries with higher fidelity near kinks [28]. For example, the normals maximization procedure explained extracts sharp features at regions of large curvature. Unfortunately, we were not able to use the normals method because it requires one to solve the "primal" problem instead.

4.3
Three-dimensional allowed volume in (g 3 ,g 4 ,g 5 ) Finally, we consider the space of EFT coefficients (g 3 ,g 4 ,g 5 ) using a similar procedure to the 2d process. Once the 2d exclusion region is obtained, points in (g 3 ,g 4 ) are sampled from the exclusion region and then the optimal upper and lower bounds ong 5 with the associated hyperplanes are computed via the constrained optimization approach. Figure 11 shows that the 3d region is narrow, suggesting that S-matrix positivity is a potent constraint on the EFT.   Finally, the exclusion plot converges as the truncation order n is increased as shown in figures 10. In particular, the n = 10 and n = 14 regions overlap better than the n = 4 and n = 10 regions. Despite doubling the number of functions n i (m 2 , J) in the null constraints N to sharpen bounds between n = 10 and n = 14, the numerics show that they generate very similar regions. Therefore, it is sufficient to consider n = 10 in order to infer general features.
5 Analytically ruling-in: the two kinks withg 4 = 1 2 It is interesting to ask whether the bounds on (g 3 ,g 4 , . . .) obtained in the preceding section are indeed optimal. We obtained these bounds using the so-called dual problem, which "rulesout" more and more space as one adds null constraints. This contrasts with the primal problem, whereby the goal is to "rule-in" coefficients of 2 → 2 S-matrices satisfying all the axioms. When results from both problems agree, the optimal solution is apprehended with complete confidence.
Systematic implementations of the primal problem have been proposed for generic field theories (without a large gap M ) [29,30]. Convergence of the dual and primal problems have also been studied for two-dimensional S-matrices [28,31]. We will not attempt to adapt these methods to our problem, but we will study simple special theories which can be ruled in analytically.
The (g 3 ,g 4 ) allowed region in fig. 8 prominently displays two kinks withg 4 = 1 2 (connected with a horizontal segment). This value is interesting, because the EFT parameter g 2 and g 4 satisfy the following sum rules: where · signifies a positive sum over states with m ≥ M and different spin, as previously defined. The coefficient g 4 = g 2 2M 4 can only be realized if the high-energy theory contains only states at the single mass m = M ! The corresponding amplitudes are therefore rational functions with poles at s, t or u = m 2 . Furthermore, they must satisfy the boundedness property lim s→0 M(s,t) s 2 = 0. It is easy to see that there are only two crossing-symmetric rational functions with these properties: Here, γ(d) is a coefficient that will ensure unitarity and positivity of the amplitude in ddimensional space-time. In particular, we have that To show this, consider the spectral density of the first amplitude (its imaginary part) is supported entirely on spin-0. Following the decomposition used in fig. 9, we have then subtracted a constant multiple of M spin−0 to remove the spin-0 content from M stu-pole . To find the left kink, we should thus tune γ(d) to make the spectral density supported only on spins J ≥ 2.
In general, the Gegenbauer polynomials satisfy an orthogonality relation [16] that allows one to extract the coefficient: where z = 1 + 2t we find Plugging in P 0 (z) = 1 and computing the integral, we find that the spin-0 component vanishes provided we choose: To verify that the partial waves are positive for J ≥ 2, one can use the Froissart-Gribov formula (see eq. (2.53) of [16]) to analytically integrate (5.5) in terms of a single residue at z = ±3; the integral is proportional to the function defined as Q (d) there, which is a positive hypergeometric function for any d ≥ 3. We conclude that the amplitude M stu-pole , with the value (5.3), satisfies all the axioms of crossing symmetry, spin-2 Regge convergence, and positivity ρ J (s) ≥ 0! Of course, this does not imply that this amplitude can indeed be realized in some fully-fledged UV complete theory, only that it cannot be ruled out with our current methods.
Let us now situate the amplitudes M spin-0 and M stu-pole in the context of our numerical exclusion plots. By series expanding at small s, t, u and comparing with the low-energy parameterization 2.6, it is straightforward to find that M spin-0 : (g 3 ,g 4 ,g 5 ) = 3 m 2 , 1 2m 4 , 5 2m 6 (5.7) M stu-pole : ). The fact that the amplitude M stu-pole realizes a negative coefficient for stu (i.e. g 3 < 0) suggests that it would be impossible to prove irreversibility of six-dimensional renormalization group flow using only crossing and positivity of 2 → 2 scattering [25]. This contrasts the successful four-dimensional case, in which the a-theorem was related to positivity of g 2 [32].  fig. 12. The parabolic "underbellies" are simply the dimensionally-rescaled theories at the kinks. This region is analytically "ruled in".
One notices that a sliver obtained via numerics is not in the span of these simple analytic models. Since the upper-left arc of this sliver cannot be generated by the convex hull of (possibly rescaled) discrete theories, we attribute it to a continuous one-parameter family of "extremal theories" which terminate at the kink. It would be interesting to find an analytic expressions for this family.
One may wonder if this family of extremal theories terminates at a second kink; we did not locate any stable candidate in the numerics. This suggests that the family instead disappears inside its own convex hull. Naturally this would occur at the point where its slope becomes tangent to the parabola of dimensionally-rescaled theories originating from that point, dg 3 g 4 = g 3 2g 4 . In this scenario, only the second derivative of the boundary shape would be discontinuous, explaining the difficulty in locating it numerically.
We find it remarkable that two simple analytic models almost span the entire region obtained via numerics: to a good approximation (up to the missing sliver), a scalar EFT is compatible with causality and unitarity if it is a positive linear combination of the models in eq. (5.2).

Concluding remarks
In this work we showed, in the case of a scalar field theory, that the space of low effective field theories is sharply constrained by positivity of the S-matrix. Dimensional analysis teaches us to expect a low-energy coupling of mass dimension n to be suppressed by a factor of 1/M n , where M is the mass of new heavy states. Our main finding is simple: in any causal and unitary theory, dimensional analysis scaling is a theorem.
More precisely, for a theory of a single identical real scalar, we showed that dimensionless ratios of the form g k /g 2 M # , where g 2 is the coefficient of 1 2 (∂φ) 4 , are bounded above and below by finite constants of order unity. Our method does not assume that physics above the scale M is weakly coupled, only that it is consistent with unitarity and causality. The technical assumption is the convergence of double-subtracted dispersion relations in 2 → 2 scattering. We then separately bounded g 2 itself in eq. (3.35).
To our (possibly incomplete) knowledge, this is the first time that two-sided bounds are obtained for interactions that vanish in the forward limit, such as the six-derivative "stu" contact interaction g 3 (see eqs. (2.3) and (4.2)). The key ingredient was to use "null constraints" (for example eq. (2.23)): integrals over the high-energy spectral density which must vanish by crossing symmetry, and which limit the contribution of higher-spin particles. A systematic procedure to extract optimal bounds was presented. The precise form of the null constraints are affected by low-energy self-interactions (i.e. loops within the low-energy EFT), whose effects would be interesting to investigate. It would be interesting to assess if bounds of this type are closely saturated in specific low-energy processes, for example pion scattering. In this case, a generalization to non-identical scalars might be necessary.
In this paper we ignored gravity. A graviton pole would cause the B 2 (t) sum rule to diverge in the forward limit, invalidating conclusions from a Taylor series around t = 0. However the X 2 (t) null constraints in eq. (A.3) should remain valid for t < 0 and their implications are worth investigating. Intuitively, one may anticipate that the graviton pole will somewhat weaken the bounds on scalar scattering [1,33,34] (see also [35]). However, following the general principle that self-consistent spinning S-matrices are harder to come by, interactions involving gravitons are likely to be sharply constrained by similar methods, possibly extending [8,36,37] or addressing conjectures of [38].
For non-gravitational scalar scattering, we found that crossing symmetry does not mix heavy spinning and spinless states. The allowed 2 → 2 low-energy S-matrices are positive sums of those two sectors. These are respectively (almost completely) spanned by the two simple analytic models in eq. (5.2), which realize theories at kinks. Perhaps this is a tantalizing hint that the collection of valid EFTs is not so vast after all.

A A basis of crossing symmetry constraints
In this section we present a complete basis of vanishing heavy averages, which vanish in any theory with no branch cut below M 2 (i.e. when neglecting low-energy loops). They are organized as functions X k (t) where k is an even integer representing the number of subtractions.
We first give a pedestrian argument considering the unsubtracted case, X 0 , and then make a general argument exploiting crossing symmetry.
As a warm-up, let us first consider a "superbounded" theory where lim s→∞ M(s, t) → 0 for t < 0, so that unsubtracted dispersion relations converge, and where the low-energy amplitude is polynomial, corresponding to g = 0 in (2.3). In the superbounded case the first sum rule is B 0 from eq. (2.18): where we relabelled the constant term in the amplitude as g 0 = −λ. Viewed as a function of t this sum rule is marginally useful because it involves infinitely many unknowns on the left-hand-side and one loses control when −t becomes of order M 2 . However, each g k with k ≥ 2 can be computed by some other sum rule: for example, the t → 0 limit of B 2 measures g 2 , B 4 (0) measures g 4 , etc. Dividing by an overall t for future convenience, we can package these into an infinite set of null constraints: X 0 (t) defines a sensible sum rule for any 0 < −t < M 2 , and Taylor-expanding around t = 0 gives an infinite number of averages which vanish by crossing symmetry (if unsubtracted sum rules converge).
Repeating the same manipulations for subtracted sum rules, we find that to cancel the cancel the generic power of t in B 2 (t) one must also both the B k (0) and its first derivative around t = 0; there is then a unique solution: X 2 (t; m 2 , J) = 0 where X 2 (t; m 2 , J) = (2m 2 + t) P J 1 + 2t m 2 − P J (1) m 2 t 2 (m 2 + t) 2 − 4P J (1) tm 2 (m 4 − t 2 ) .

A.1 Derivation using dispersion relations in three channels
Let us now present the general case along with a direct derivation. The idea is to combine dispersion relations in all three channels. Consider the following identity: where each contour is a product of circles: C s = Circle s∼0 ∧ Circle t→∞ , C t = Circle t∼0,t ∧ Circle u→∞ and C u = Circle u∼0 ∧ Circle s→∞ . The first contour implements a fixed-s sum rule, which we then evaluate at s → 0, and similar for the others, but note that the fixed-t relation is evaluated at both t = 0 and t = t . All the integrals vanish (for k ≥ 2) due to the vanishing at large-t of M(s, t)/t 2 . (The contours around 0 are a bit dangerous, since the integrals around say large-t converges only when s < 0; this should not be a problem since (ignoring EFT loops) one can interpret the residue as extracting the coefficient of 1/s in the Laurent expansion as s → 0 − .) The trick is then to deform the arcs at infinity to pick high-energy cuts and low-energy poles, as in fig 3 of the main text. What is nice with the above contour is that all double-residues cancel because of antisymmetry of the contours. For the double residue at (s, t) = (0, t ) the cancellation simply follows from mismatching orientations, since the contours C s and C t compute the same double-residue in opposite orders: 0 = Circle s∼0 ∧ Circle t∼t + Circle t∼t ∧ Circle s∼0 . (A.6) Alternatively, in practice, one may think of the integral 1 2πi Circle s∼0 as just a residue, defined by plucking ds s ∧ from the left of the differential form it multiplies, and the integral vanishes because the thus-defined residue operation is antisymmetric. The nested poles at (s, t) = (0, 0) require more care (one has to perform a blow-up) since all three contours contribute, but one still finds a perfect cancellation, as one may verify explicitly: for any polynomial numerator. This is curiously reminiscent of the Jacobi identity. The upshot is that the integral (A.5) is orthogonal to any tree-level low-energy EFT amplitude (whether or not it is s-t-u symmetric), it gives purely a constraint on high-energy cuts. Similar constraints would follow for any choice of (rational) denominator. Each term in eq. (A.5) contributes cuts in two channels, for example the C s term contributes Res s=0 1 m 2 (t − m 2 ) + 1 (m 2 + s)(m 2 + s + t ) m 2 P J 1 + 2s m 2 s(sm 2 (m 2 + s)) k/2 (A. 8) where the two terms in the parenthesis come from the t and u-channel heavy cuts, respectively. If we did not assume s−t−u symmetry, we would get a relation between the three spectral densities. Here we record only the simplified result assuming that all the spectral densities are the same: 0 = X k (t; m 2 , J) where X k (t; m 2 , J) = 2m 2 + t t(m 2 + t) P J 1 + 2t Expanding at small t, the X k (t) sum rules admit regular Taylor series, which reproduce precisely the sum rules recorded in eq. (3.29). Namely, the coefficient of t n in X k has degree 3k 2 + n + 1 in 1/m 2 , so the first case is X 2 (0) ∼ g 4 . The first time one gets two X sum rules is at weight 7, where X 2 (0) and X 4 (0) span g 7 and g 7 . The number of X sum rules per degree increases every 3 degree because of the 1/(stu) k/2 factor in eq. (A.5). The number of X sum rules thus agrees precisely with the counting below eq. (3.29). We conclude that the X k (t) sum rules are a complete basis of sum rules orthogonal to tree-level EFTs! As mentioned below eq. (3.30), when EFT loops are included these sum rules may average to nonzero but computable quantities. -0.524 1.125 g 9 -13.60 3 g 10 0 0.0625 g 10 -6.32 3.75 k refers to the coefficient of (s 2 + t 2 + u 2 ) k−3(2p+δ k,odd ) 2 (stu) 2p+δ k,odd , which has degree k in Mandelstam invariants and contains 2p powers of stu more than the minimum at that degree. The upper bounds are all simple rational numbers realized by the M spin-0 model. The values (except forg 3 ) were calculated at order n = 10, which corresponds to the number of null constraints of dim N = 12.

B Bounds on operators up to order s 10
In table 3 we record numerical bounds on various EFT coefficients in four spacetime dimensions.