Shocks, Superconvergence, and a Stringy Equivalence Principle

We study propagation of a probe particle through a series of closely situated gravitational shocks. We argue that in any UV-complete theory of gravity the result does not depend on the shock ordering - in other words, coincident gravitational shocks commute. Shock commutativity leads to nontrivial constraints on low-energy effective theories. In particular, it excludes non-minimal gravitational couplings unless extra degrees of freedom are judiciously added. In flat space, these constraints are encoded in the vanishing of a certain"superconvergence sum rule."In AdS, shock commutativity becomes the statement that average null energy (ANEC) operators commute in the dual CFT. We prove commutativity of ANEC operators in any unitary CFT and establish sufficient conditions for commutativity of more general light-ray operators. Superconvergence sum rules on CFT data can be obtained by inserting complete sets of states between light-ray operators. In a planar 4d CFT, these sum rules express (a-c)/c in terms of the OPE data of single-trace operators.


Introduction
In General Relativity (GR), particles follow geodesics regardless of their polarizations or internal composition. This is sometimes called the "strong equivalence principle" [1]. However, in the presence of non-minimal (higher-derivative) couplings, this principle is no longer true -the path of a particle can depend on its polarization and is not given by a geodesic. Such modifications of GR are known to be in tension with causality and unitarity. 1 A simple example is the propagation of a probe particle through a gravitational shock (gravitational field of a highly-boosted particle). In GR, propagation through a shock leads to a velocity kick and a Shapiro time delay. By contrast, in theories with non-minimal gravitational couplings, there can be gravitational birefringence: depending on the polarization of the probe particle, the effect of the shock can be different. Moreover, for certain polarizations, the probe particle can experience a time advance [3]. By arranging many shocks one after the other, one can accumulate the time advances and produce macroscopic violations of asymptotic causality. The restoration of causality requires an infinite set of massive higher-spin particles. It was argued in [3] that the masses of these higher-spin particles must be related to the scale that enters the modified gravitational coupling 2 α GB 1 m 2 gap . (1.1) Indeed, this is what happens in string theory, where the higher-spin particles are string excitations. Similar bounds on three-point couplings were derived in [3][4][5][6][7][8]. A common feature of these arguments is the lack of a sharp equality relating the non-minimal couplings to the extra degrees of freedom required for causality. In this work, we provide such an equality between α GB and contributions of massive states in a general gravitational theory. We note that non-minimal gravitational couplings introduce another feature that is absent in GR, namely non-commutativity of coincident gravitational shocks. 3 This is another violation of the strong equivalence principle. Indeed, as we review below, geodesics are insensitive to the ordering of gravitational shocks. On the other hand, for theories with non-minimal gravitational couplings, the effect of propagation through multiple shocks depends on the ordering of the shocks. What is less trivial is that this effect can be traced to pathological behavior of the scattering amplitude in the Regge limit. We find that the converse is also true: in any UV-complete gravitational theory, soft Regge behavior guarantees that coincident gravitational shocks must commute. This can be readily checked in tree-level string theory.
Therefore, we suggest that a weaker "stringy" equivalence principle does hold in general UV-complete gravitational theories: coincident gravitational shocks commute. In contrast to the causality discussion above, commutativity of coincident shocks leads to quantitative sum rules that equate the size of non-minimal couplings to the extra degrees of freedom that are present in the theory.
In section 2, we explain how commutativity of coincident shocks is equivalent to boundedness of amplitudes A(s, t) in the Regge limit (t → ∞ with fixed s). Specifically, shocks with spins J 1 and J 2 commute if and only if the Regge intercept of the theory J 0 satisfies 4 J 1 + J 2 > J 0 + 1. (1.2) For example, gravitational (spin-2) shocks commute if J 0 < 3. It has been argued that consistent weakly-coupled gravity theories in flat space actually obey J 0 ≤ 2 [3], so gravitational shocks certainly commute in this case (as do higher-spin shocks). In section 2.2, we show that commutativity of coincident shocks is equivalent to certain dispersion relations called "superconvergence" sum rules [9,10]. When applied to gravitational amplitudes, these sum rules express (squares of) non-minimal couplings in terms of three-point couplings of massive states. This shows that non-minimal couplings cannot exist without additional massive states, recovering a result from [3] in a different way. In subsequent sections, we study superconvergence sum rules in several examples, showing explicitly how they are obeyed in GR (section 2.3), disobeyed in higher-derivative gravity theories (section 2.4), but obeyed in string theories (sections 2.5 and 2.6). Indeed, the failure of superconvergence sum rules in higher-derivative theories like Gauss-Bonnet gravity gives an efficient way to show that they violate the Regge boundedness condition J 0 < 3 without computing full amplitudes in those theories.
In AdS, commutativity of coincident shocks translates into a statement that can be proven nonperturbatively using CFT techniques. As we review in section 3, we can create shocks by integrating local operators along null lines on the boundary of AdS. It is most natural to study propagation through AdS shocks using observables called "event shapes" [11][12][13][14][15], which we review in sections 3.2 and 3.3. Commutativity of coincident shocks becomes the statement that two null-integrated operators commute when placed on the same null plane: Here, we use lightcone coordinates ds 2 = −du dv + d y 2 . The CFT operators lie on the same plane u = 0 but at different transverse positions y 1 , y 2 ∈ R d−2 . Furthermore, their vector indices are aligned with the direction of integration (the v-direction). For example, when O 1 and O 2 are both the stress-tensor T µν , (1.3) becomes a commutator of average null energy operators. An average null energy operator on the boundary creates a gravitational shock in the bulk. The commutativity statement (1.3) might seem obvious, since O 1 and O 2 are spacelikeseparated everywhere along their integration contours. However, the spacelike separation argument is too quick, and is actually wrong in some examples (see appendix B). The problem is that the positions of O 1 and O 2 coincide at the endpoints of their integration contours (in an appropriate conformal frame), and one must be careful to analyze what happens there.
We perform a careful analysis of the commutator (1.3) in section 4, explaining the circumstances when it is well-defined (but not necessarily zero), and the additional conditions required for it to vanish. A necessary condition for vanishing is where J 1 and J 2 are the spins of O 1 and O 2 , and J 0 is the Regge intercept of the CFT [16][17][18][19][20]. In upcoming work [21], we also show that a non-vanishing commutator necessarily leads to a Regge pole at J = J 1 + J 2 − 1.
In section 4.2.4, we prove that J 0 ≤ 1 in nonperturbative CFTs (generalizing arguments of [19,22] to spinning correlators). This establishes commutativity of average null energy operators in nonperturbative theories 5 ∞ −∞ dv 1 T vv (u = 0, v 1 , y 1 ), ∞ −∞ dv 2 T vv (u = 0, v 2 , y 2 ) = 0, (1.5) for y 1 = y 2 . For large-N theories in the planar limit, the bound on chaos [26] implies that J 0 ≤ 2. Thus, average null energy operators commute in planar theories as well. However, commutativity can be lost at higher orders in large-N perturbation theory (and only recovered nonperturbatively). The condition (1.4) is in direct analogy to the condition (1.2) in flat space. When it holds, one can derive analogous superconvergence sum rules for CFTs by evaluating event shapes of (1.3). In section 5, we show how to compute these event shapes using the conformal block decomposition, expressing them as sums over intermediate CFT states. 6 The relevant conformal blocks can be computed explicitly in any spacetime dimension. The blocks for stress tensors agree perfectly with our bulk calculations from section 3. The result is an infinite set of superconvergence sum rules for CFT data. 5 Commutativity of average null energy (ANEC) operators is important for understanding informationtheoretic aspects of CFTs [23,24], and plays a central role in the recently proposed BMS symmetry in CFT [25]. As far as we are aware, it is usually argued for using the fact that the stress tensors are spacelike separated. Our analysis closes a loophole in this argument. 6 The conformal blocks we study in this work are for the "lightray-local → lightray-local" channel. This is the conventional OPE. By contrast, in [21] we develop a new type of OPE that allows one to describe the "lightray-lightray → local-local" channel.
Of course, the usual crossing symmetry equations [27,28] are also an infinite set of sum rules for CFT data. However, CFT superconvergence sum rules have some nice properties. In large-N theories in the planar limit, they get contributions only from single-trace operators and non-minimal three-point structures (i.e. three-point structures that do not arise from GR in AdS). Thus, one obtains expressions for non-minimal three-point coefficients in terms of massive "stringy" states, analogous to superconvergence sum rules in flat space.
As an example, in 4d CFTs, we find the superconvergence sum rules + non-scalar, + non-scalar, (1.6) along with an infinite number of others. Here, t 2 and t 4 are coefficients of non-Einstein threepoint structures in the correlator T T T , see [14]. For example, in 4d N = 1 theories, we have t 2 = 6(c−a) c and t 4 = 0. The sums in (1.6) run over scalar operators φ with dimensions ∆ φ and OPE coefficients λ T T φ in the T × T OPE. The term "non-scalar" refers to contributions of operators with spin J ≥ 2, not including the stress-tensor (whose contribution is on the lefthand side of (1.6)). For simplicity, we have written only the sum rules that get contributions from scalar operators. Other sum rules give expressions for other combinations of t 2 and t 4 , but involve exclusively non-scalars. The factors Γ(4− ∆ φ 2 ) −2 in (1.6) ensure that contributions of double-trace operators are suppressed by O(1/N 4 ) in the large-N limit. This is a generic feature of superconvergence sum rules and it stems from the fact that they can be written in terms of a double-discontinuity [19]. In particular, one can see explicitly that if no single-trace operators are present other than the stress tensor, then t 2 and t 4 must vanish in the planar limit. 7,8 In [30], it was conjectured that for any CFT with a large gap ∆ gap in the spectrum of spin J ≥ 3 single-trace operators ("stringy states"), non-minimal couplings in the effective bulk Lagrangian should be suppressed by powers of 1/∆ gap . In section 6.1, we argue (non-rigorously) that the contributions of stringy states to superconvergence sum rules are suppressed by powers of 1/∆ gap , and this establishes the conjecture of [30] (in the case of three-point couplings) in a way different from the arguments of [3][4][5][6][7][8]. We conclude in section 6.2.
In appendix A, we give more details about superconvergence sum rules in flat space. In appendix B, we give an example of the phenomenon of "detector cross-talk," where naïvely spacelike light-ray operators can fail to commute. In the remaining appendices we provide details about sum rules in CFT. 7 Strictly speaking the above equations only fix t4 + 2t2, but there are also linearly independent constraints from other components of the sum rule. 8 Our methods for computing event shapes may be also useful for investigating positivity conditions. In [29] it was shown that positivity of multi-point energy correlators also leads to vanishing non-minimal couplings, e.g. t2, t4 = 0 for T T T , in theories with only gravitons and photons in the bulk.
Note added: After this work had been largely completed, we became aware of [31] which has some overlap with this paper.

Shocks and superconvergence in flat space
In General Relativity (GR), test bodies follow geodesics, and it follows that the effects of coincident shocks are commutative. To see this, consider a shockwave in flat space [32,33] Gp v | y| D−4 δ(u)du 2 + d y 2 , (2.1) and let us study null geodesics in this geometry. 9 A shockwave is a gravitational field created by a relativistic source. The Aichelburg-Sexl geometry (2.1) is an exact solution of Einstein's equations with a stress-energy source T uu (u, y) = p v δ(u)δ (D−2) ( y) localized on a null geodesic. The only non-trivial metric com- Gp v | y| D−4 δ(u) is a solution of the Laplace equation in the transverse plane parametrized by y with a non-trivial source y h uu (u, y) = −16πGT uu (u, y). (2.2) Famously, (2.1) continues to be an exact solution in any higher derivative theory of gravity as well [35]. Higher derivative interactions, however, lead to nontrivial corrections to the propagation of probe particles on the shockwave backgrounds. Consider a probe particle on the shockwave background that follows a null geodesic. We can parameterize the null geodesic by u. Suppose the geodesic approaches the shock at impact parameter y(u = 0) = b. Crossing the shock causes both a Shapiro time delay and a velocity kick in the transverse plane due to gravitational attractioṅ The same result can be obtained by analyzing wave equations on the shockwave backgrounds or using scattering amplitudes (as we review in detail below). In General Relativity the effect of a shockwave on a probe particle does not depend on the the polarization of the particle. This is no longer true in higher-derivative theories of gravity. This can lead to Shapiro time advances and violations of asymptotic causality [3]. We can also consider a more complicated geometry constructed by a superposition of relativistic sources localized at different retarded times u i and transverse positions b i . The 1 Figure 1: The probe geodesic is denoted by a black line and shock waves by red lines. Dashed lines mark time delay associated to each shock. In General Relativity propagation through a pair of closely situated shockwaves is commutative, namely the overall effect does not depend on the order of the shocks. This is no longer true in theories with higher derivative corrections. We argue that commutativity must hold in any UV complete theory. exact gravitational field created by such a superposition is simply a sum of the shockwaves (2.1). If a probe particle follows a geodesic, then propagation through a series of closely situated shocks leads to an additive effect. In particular, the result does not depend on the ordering of shocks and is commutative ( figure 1). This is no longer true in theories with higher derivative corrections. In this case, the result of the propagation through a pair of closely situated shocks will generically depend on their ordering.
In this section we show that commutativity of shockwaves is directly related to the Regge limit. In particular, we argue that in any UV complete theory (gravitational or not) the shock waves must commute. Therefore, any non-commutativity of shocks present in the low energy effective theory should be exactly canceled by the extra degrees of freedom. The mathematical expression of this fact is encapsulated in the superconvergence sum rules which we describe in detail below.

Shockwave amplitudes
It will instructive for our purposes to restate the discussion of the previous section in terms of scattering amplitudes. This has been done in [3], whose setup we review momentarily. In the simplest case of a propagation through a single shock, we consider an absorption of a virtual graviton by a probe particle Obtaining such solutions from a smooth Cauchy data (or limits thereof) in gravitational theories can be subtle and was discussed for example in [34]. For us, the solution (2.1) is a convenient way to think about the high energy limit of gravitational scattering and per se does not play any role.
where X and X ′ describe a particle in an initial and final state (these could be different) and g * stands for a virtual graviton that is emitted from some extra source that we do not write down explicitly. 10 We denote the corresponding scattering amplitude A g * X→X ′ .
To discuss causality, we consider the high-energy behavior of the scattering amplitude in the forward direction, see [3]. A convenient choice of momenta and polarization for the process (2.5) is Here, we use lightcone coordinates (u, v, y) with metric The virtuality of the graviton is q 2 . An interesting phenomenon occurs when studying this process in impact parameter space. In this case, the virtual graviton emitted by a source comes with the following wavefunction The remarkable property of this integral is that it can be evaluated by taking the residue at q 2 = 0, so that the virtual graviton becomes on-shell! The on-shell condition q 2 = 0 requires that q becomes complex. The result is that the physical phase shift δ(p v , b) is computed by a scattering amplitude in spacetime with mixed signature (making q complex corresponds to a second Wick rotation). More precisely, we get where A gX→X ′ is now a usual on-shell amplitude, albeit evaluated in slightly unusual kinematics. Note that the on-shell condition q 2 = 0 is reflected in (2.9) by the fact that 1 | b| D−4 is a harmonic function, so it is killed by (−i∂ b ) 2 . The causality discussion of [3] then focuses on the properties of the on-shell amplitude A g * X→X ′ and shows that in gravitational theories with higher derivative corrections it can lead to causality violations unless new degrees of freedom are added.
We would like to emphasize that the whole discussion can be formulated in terms of onshell amplitudes with shockwave gravitons g * and their properties. This is particularly useful in a gravitational theory, where there is no clear definition of off-shell observables. Therefore it is desirable to formulate the problem purely in terms of on-shell observables. This is the path we follow below. X(p 3 ) g * (p 2 ) g * (p 1 ) X(p 4 ) Figure 2: An elastic scattering of a probe particle and an on-shell shock state graviton. We will argue that the Regge behavior of this amplitude in consistent theories of gravity is such that gravitational shockwaves always commute. We adopt a CFT correlator-like prescription where the time in the diagram goes from right to left.
In this paper we will be interested in elastic scattering of a probe X and two shockwave gravitons g * g * (p 2 )X(p 3 ) → g * (p 1 )X(p 4 ), (2.10) see figure 2. We choose the final state of the probe to be X for simplicity, but the whole discussion goes through intact for any other one-particle state X ′ . A convenient choice of momenta is (2.11) and again the shockwave gravitons have polarizations ǫ g * = (0, −2, 0) as in (2.1). As above, we can think of each shockwave graviton as originating from a pair of particles that we have not written explicitly. In this way, A g * X→g * X can be thought of as an economical description of the relevant part of the six-point amplitude. An on-shell condition is q 2 1 = q 2 2 = 0. As before this naturally arises in the impact parameter transform Below, we simply study the on-shell amplitude A g * X→g * X as an object on its own, without referring to impact parameter space. 11 11 In a gravitational theory when trying to separate a probe from the rest of the system we should check that the joint system does not form a black hole. In particular, we would like the impact parameter b in the discussion above to be larger than the Schwarzschild radius of the system r D−3 S = √ p u p v GN . In momentum space this implies that This implies that for given q1, q2 we can only consider the picture of a probe propagating through a shockwave is not the correct description of physics. This does not present a problem in a tree-level gravitational theory when we work to leading order in GN and thus can make it arbitrarily small (or, similarly, if we consider gravitational deep inelastic scatering in a gapped QFT, see section 2.8). At finite GN , we can still formally One new feature of (2.11) compared to (2.6) is that the shockwave graviton transfers to the probe a large longitudinal momentum p v . In [3] the shockwaves were carefully separated in the u direction such that p v is effectively set to 0 and the amplitude A g * X→g * X reduces to a product of a one-shock interactions A g * X→X ′ A g * X ′ →X . Our regime of interest is the opposite, namely we would like to put the shockwaves on top of each other. This effectively leads to studying A g * X→g * X at arbitrarily large values of p v .

Shock commutativity and the Regge limit
An important class of shockwave amplitudes arises when the shockwaves are localized on null planes. Such objects provide a natural translation into the language of on-shell amplitudes of propagation through classical shock backgrounds like (2.1). For simplicity, we consider 2 → 2 scattering amplitudes of massless scalars in this section, where particles 1 and 2 play the role of shocks. We generalize to the case of gravitational (or spinning) shocks in the next section.
A shock wavefunction localized at u = u 0 is given by where q ∈ C D−2 is null. In order for p µ to be on-shell, q must be complex. Note also that p v is integrated over both positive and negative values. As explained in the previous section, such wavefunctions do not represent physical incoming or outgoing particles, but rather can be thought of as arising from poles of higher-point amplitudes.
Let shocks 1 and 2 be localized at u 1 and u 2 . We take particles 3 and 4 to be momentum eigenstates. Overall, the momenta in "all-incoming" conventions are given by (2.11), where we must integrate over p v to create the delta-function localized shocks. Let us define the Mandelstam variables s = −(p 1 + p 2 ) 2 = −( q 1 + q 2 ) 2 and t = −(p 2 + p 3 ) 2 = p u p v . 12 Denote the scattering amplitude for particles 1, 2, 3, and 4 in momentum space by iA (s, t). Plugging in shock-wavefunctions (2.13) for particles 1 and 2, and applying momentum conservation p v 1 = −p v 2 = −p v , we find that the amplitude for particles 3 and 4 scattering with shocks 1 and 2 is (2.14) define the amplitude Ag * X→g * X in kinematics (2.11) with complex null transverse momenta, but its physical interpretation at arbitrarily high energies is less clear. We ignore this subtlety below and only consider treelevel examples in flat space, though we believe everything we say holds at finite GN as well. This problem does not appear in our CFT discussion where g * corresponds to a light-ray operator insertion in a boundary CFT and has, thus, a clear definition in a finite N CFT. 12 This labeling is chosen for consistency with our CFT conventions in section 5. Note that the roles of t and s are swapped relative to usual discussions of high energy scattering.
t Figure 3: Integration contour for computing the shock amplitude. The integral is along the real axis, rotated by a small positive angle. The dashed lines represent t-and u-channel cuts. We assume that s < 0.
The amplitude A(s, t) may have singularities on the real t-axis. As usual, the correct prescription is to approach these singularities from above for positive t and below for negative t. The corresponding integration contour is shown in figure 3.
When the shocks are separated in the u direction, the amplitude should factorize into a product of S-matrix elements describing successive interactions with each shock. This comes about as follows. Note that the factor causes the integrand in (2.14) to be exponentially damped in t in either the upper or lower half-plane, depending on the sign of ∆u. For example, suppose ∆u > 0. The integrand is damped for Im t < 0, so we can wrap the t-contour around the cut on the positive t-axis, giving a discontinuity (figure 4a) where . The formula (2.16) is true as long as A(s, t) grows sub-exponentially in t. The discontinuity factors into products of on-shell amplitudes in the t-channel Here, we use all-incoming notation, so −X indicates the state X with momentum and helicities flipped. Plugging this into (2.16), we get an expression for the shock amplitude as a sum over intermediate states X. The physical interpretation is that particle 3 propagates through shock t (a) When ∆u > 0, we can wrap the contour around the positive t-axis. t (b) When ∆u < 0, we can wrap the contour around the negative t-axis. Figure 4: Depending on whether the integrand decays exponentially in the upper or lower half-plane, we can deform the contour in different ways. In both cases, the old contour is shown in gray and the new contour in black. The direction of deforming the contour is indicated with a dotted arrow.
2, creating an intermediate state X. The intermediate state X propagates through shock 1 to become particle 4. Next, suppose ∆u < 0, so that shock 1 occurs before shock 2. In this case, we can fold the t-contour to wrap around the u-channel cut (figure 4b), The discontinuity now factorizes into a product of on-shell amplitudes in the u-channel (2.20) Let us take a limit where the shocks become coincident, ∆u → 0 ± . We find For these quantities to be well-defined, the discontinuities should die faster than |t| −1 along the real t-axis. Let us assume this is the case. The commutator of coincident shocks is It is not immediately obvious what the sum (2.23) should be. However, if A(s, t) decays faster than t −1 in the limit of large complex t with fixed s (the Regge limit), then we can close the integration contour in (2.22) by including arcs at infinity and shrink it to zero. Thus, coincident shocks commute if and only if the amplitude is sufficiently soft in the Regge limit. When the amplitude decays faster than t −1 for fixed s, then we obtain the condition which is an example of a "superconvergence" sum rule [9,10]. In general, if the amplitude dies faster than t −N −1 in the Regge limit, we can integrate its discontinuity against t n for 0 ≤ n < N to obtain additional superconvergence sum rules. Note that we obtain a different sum rule for each s. Superconvergence sum rules have been used, for example, to bootstrap the Veneziano amplitude [36].

Spinning shocks
While scalar shock amplitudes must decay faster than t −1 to have a superconvergence sum rule, this condition gets relaxed for spinning shocks. Consider a massless spin-J particle described by a traceless symmetric tensor field h µ 1 ···µ J (x). It is convenient to define a symmetric States are labeled by a momentum p µ and a transverse traceless-symmetric polarization tensor ǫ µ 1 ···µ J , modulo gauge redundancy. Let us parameterize the polarization tensor as a product of vectors ǫ µ 1 ···µ J = ǫ µ 1 · · · ǫ µ J , where ǫ µ is transverse and null. 13 Thus, we can label momentum eigenstates by |p, ǫ where p µ and ǫ µ are vectors, and the state is a homogeneous polynomial of degree-J in ǫ.
A momentum eigenstate has wavefunction We are interested in shock-wavefunctions of the form For example, when J = 2, h J (x, dx) = h µν dx µ dx ν has an interpretation as a metric perturbation. In this case, (2.26) is the perturbation in the Aichelburg-Sexl shockwave metric (Fourier-transformed in the transverse space) [32]. Comparing to (2.25), we see that The computation of the shock amplitude and its commutator goes through essentially unchanged from the scalar case. To understand how the spinning amplitude should behave at large t, it is useful to tie the Mandelstam variable t to a symmetry generator. Consider a boost parameterized by z ∈ C, The boost acts on the momenta p 1 , p 2 by rescaling p v → zp v , which also rescales t → zt. 14 Thus, the shock commutator can be written are the polarizations of the shocks, and ǫ 3 , ǫ 4 are the polarization vectors of the other particles. Let us define a boosted amplitude A(z) by acting with Λ(z) on particles 1 and 2, keeping particles 3 and 4 fixed. Note that the boost acts nontrivially on the polarization vectors (2.30): (2.31) Thus, In terms of the boosted amplitude, the shock commutator becomes Suppose that A(z) grows like z J 0 at large z, where J 0 is a theory-dependent Regge intercept. (The Regge intercept may be s-dependent, but we are suppressing that dependence for now.) We see that the shock commutator vanishes if and only if In the particular case where particles 1 and 2 are gravitons, the shock commutator vanishes if J 0 < 3. It was argued in [3] that -for physical s -J 0 > 2 leads to a violation of causality. 15 Assuming this argument, it follows that coincident gravitational shocks commute and the superconvergence sum rule (2.24) holds in any causal theory. Furthermore, a failure of commutativity signals a violation of causality. It is instructive to see how (2.24) is obeyed (or not) in various examples.
We can also consider higher-point scattering amplitudes, where momenta p µ 3 and p µ 4 in the argument above stand for a sum of momenta of many particles. Assuming that the same bound on the Regge behavior holds for higher-point amplitudes we get higher-point analogs of the superconvergence relation (2.24), where the integral is taken over the discontinuity with respect to t of higher-point amplitudes. We expect that commutativity of shocks should be true as "an operator equation," in other words, for any scattering amplitude. This is what we find in AdS, where commutativity of shocks is dual to an operator equation in CFT. It would be interesting to explore this possibility further.

Shock commutativity in General Relativity
Let us re-derive commutativity of shocks in General Relativity using scattering amplitudes and equation (2.22). The on-shell condition for the intermediate particle implies that p v = 0 and p X = (p u X , 0, q X ), where q depends on the order of shocks. We choose the polarization of the intermediate particle to be ǫ X = (0, −2 ( e X . q) p u , e X ) so that the sum over intermediate states X ǫ µ X (ǫ * X ) ν acts as an identity matrix in the transverse space, X e i X (e * X ) j = δ ij . We write all amplitudes below in all-incoming notation.

Minimally-coupled scalar
As the simplest example, consider a massless scalar field minimally coupled to gravity. The relevant scalar-scalar-graviton three-point amplitude takes the form where ǫ 1 = (0, −2, 0) is the polarization of the graviton and we evaluated the amplitude in the kinematics of the previous section, in particular p X = (−p u , 0, q 2 − q 1 ). Note that the three-point amplitude does not depend on the transverse momentum of the shock q i and therefore the shock commutator vanishes. We find Finally, we can reach the same conclusion by considering a scalar field propagating on a shockwave background ds 2 = −du dv + δ(u)h(u, y)du 2 + d y 2 , h(u, y) = 4π(ǫ 1 e i y· q 1 + ǫ 2 e i y· q 2 ), (2.37) where q 2 1 = q 2 2 = 0. The wave equation ∇ 2 φ = 0 takes the form We solve across the locus u = 0 as Choosing the initial state φ before = e −i 1 2 vp u e −i y q 2 as in the amplitude computation, and focusing on the term linear in ǫ 1 ǫ 2 we get φ after = δ P S e −i 1 2 vp u e i y q 1 , (2.40) where δ PS is the phase shift acquired by crossing a shock.
To compare with the amplitude computation we must compute (2.43) in complete agreement with the amplitude computation. Note also that in this way we can compute the effect of propagation of a probe through an arbitrary number of shocks. Again, we see that the order of shocks does not matter in this case.

Minimally-coupled photon
Next, let us consider particles with spin. It is convenient to choose external polarizations as follows In Einstein-Maxwell theory, the three-point amplitude is given by where ǫ X = (0, −2 ( e X · q 2 − q 1 ) p u , e X ) and we used that ǫ 1 · ǫ 3 = ǫ 1 · ǫ X = 0. Summing over intermediate states as in (2.22) trivially gives which obviously does not depend on the order of the shocks: Again we can reproduce this result using the Einstein-Maxwell equations of motion on a shockwave background [3] Taking our initial state to be a plane wave, let us focus on the transverse polarizations A before = e 3 e −i 1 2 vp u e −i y q 2 . After crossing the shock, we have Again, computing the overlap we recover (2.47).

Gravitons
Finally, the three-point function of gravitons in General Relativity takes the form Summing over intermediates states we find The result is independent of the shock ordering. This result can be reproduced via computing the propagation of a graviton through a shockwave background as above.
To summarize, minimally-coupled matter and gravitons lead to commuting gravitational shocks (at tree level). This can be verified by studying shock amplitudes, or by studying wave equations on a shockwave background.

Non-minimally coupled photons
Let us now demonstrate that commutativity of shocks can be lost in theories with nonminimal couplings to gravity. As a first example, consider a non-minimal coupling of photons to gravity of the schematic form α 2 RF F . The relevant three-point amplitude takes the form where we used that ǫ X · p 3 = − e X · q 1 . Summing over intermediate states, we can compute the commutator (2.22) which is clearly nonvanishing. Let us reproduce the same result using equations of motion.
In the presence of the RF F coupling, the equation of motion (2.49) gets modified to The solution takes the same form as (2.50), except now the transfer matrix that describes how polarizations change when propagating though a shock is not diagonal. Instead it takes the form δ ij h + α 2 ∂ i ∂ j h, which for a given shock is Different orderings of shocks now lead to different results. The commutator (2.54) becomes e i 4 [M ( q 1 ), M ( q 2 )] ij e j 3 . More generally, given multiple shocks ordered according to u 1 > u 2 > · · · > u k , the amplitude is given by a corresponding product of (noncommuting) shockwave transfer matrices (2.57)

Higher derivative gravity
In higher-derivative gravity, there are two additional graviton three-point amplitudes: Together with A R given in (2.51), these form a complete list of three-point structures allowed by Lorentz invariance. To contract the three point amplitudes we substitute For a general linear combination R + α 2 R 2 + α 4 R 3 , the shock commutator is It is easy to check that this vanishes if and only if α 2 = α 4 = 0. Again the same result can be obtained using the classical equations of motions as above, see [3]. The difference compared to General Relativity is that the shockwave transfer matrix is polarization-dependent, which leads to non-commutativity or violations of superconvergence relations in theories with non-minimal coupling to gravity. Explicitly, the transfer matrix is To compute the transfer through n shocks we simply multiply the corresponding transfer matrices, as in (2.57). Non-minimal couplings of gravitons to matter also contribute to noncommutativity. For example, consider an interaction of the schematic form φR 2 where φ is a (possibly massive) scalar. The only possible three-point structure is The φR 2 coupling allows φ to appear as an intermediate state when a graviton propagates through two shocks. The corresponding contribution to the shock commutator is (2.63)

Graviton scattering in string theory
Non-minimal couplings generically lead to non-commuting coincident shocks (at tree level).
In any theory with Regge intercept J 0 < 3, the contributions from non-minimal couplings to the shock commutator must be cancelled by high-energy states or loop effects. As a concrete example, consider graviton scattering in tree-level string theory. The amplitude takes the form [38,39] in units where α ′ = 2. (Recall that we have parametrized the graviton polarizations as ǫ µν = ǫ µ ǫ ν .) There are two basic tensors K (i) that can appear in the amplitude: K (b) , which occurs in the open bosonic string, and K (ss) , which appears in the open superstring. Accordingly, the four-point amplitude of closed strings has three possible tensor structures, corresponding to bosonic strings K (b) K (b) , heterotic strings K (b) K (ss) , and superstrings K (ss) K (ss) .
Plugging in the momenta (2.11) and polarization vectors (2.30) and (2.44), we find Including only the contribution from graviton exchange, the last two terms in (2.65) give a nontrivial shock commutator. However, if we boost the energy t → zt, the gamma functions in (2.64) ensure that the amplitude dies as z −2+s . Thus, upon integrating the full discontinuity, we must find that the graviton contribution is cancelled by heavy modes. For example, in the heterotic string, we find the sum rule dt Disc t A = (p u ) 4 s e 3 · e 4 ( e 3 · q 1 e 4 · q 2 − e 3 · q 2 e 4 · q 1 ) 1 + with (a) n = Γ(a+n) Γ(a) the Pochhammer symbol. In equation (2.66), 1 represents the graviton multiplet contribution and the sum over n ≥ 1 is the contribution of heavy modes. It is possible to decouple almost all the heavy modes by setting s = 0, in which case one finds 1 + r 1 (0) = 0. (2.68) In general, the sum over heavy modes converges like n n s−3 .

Shock S-matrix in string theory
In addition to studying four-point amplitudes, we can equivalently analyze the equation of motion for a string on a shockwave background, analogous to our discussions of equations of motion for scalars, photons, and gravitons. Propagation of strings on shockwave backgrounds was discussed in [35,[40][41][42][43][44][45][46]. This leads to an understanding of shock commutativity in terms of the string S-matrix for propagation through a shock. Let us follow the conventions of [40]. Recall that the string mode operators obey where negative modes n < 0 create string excitations, while positive modes n > 0 annihilate the vacuum We choose the conformal gauge for the worldsheet metric h αβ = η αβ , and fix the light-cone gauge u(σ, τ ) = P u τ. (2.71) The closed string mode expansion takes the form α i n e −2inτ − α i † n e 2inτ e −2inσ , (i = 2, . . . , D − 1), (2.72) where α i † n = α i −n . Before and after a shock, the string propagates freely. If the shock geometry has the metric the transition through the shock is described by the S-matrix [40], As an example, consider a shock created by a fast-moving particle at position x a , In writing f a above, there is an ambiguity in the ordering of operators X i (σ, 0). However, this ambiguity is proportional to q 2 1 and is localized at zero impact parameter upon doing the Fourier transform. It is therefore irrelevant for our purposes.
Note that the the operator S shock is diagonal in the position basis X i (σ, 0) for the transverse oscillators. Thus, it instantaneously changes the momenta of the oscillators without affecting their positions. Overall, the effect of the shock on the string is the same as in the geodesic calculation (2.4): the center of mass of the string moves in the v direction (Shapiro time delay), and the transverse modes receive an instantaneous kick that depends on the profile f a ( X). Essentially, each part of the string individually follows a geodesic through the shock. Thus, coincident shocks commute because they commute for geodesics.
To see this in more detail, consider the matrix element for propagation through two shocks We are interested in states |Ψ that are eigenstates of x (fixed center of mass position in the transverse plane), with a finite number of oscillator excitations above the vacuum. As in the previous section, let us choose x|ψ cm = 0. The relevant correlator takes the form (2.77) The same formula is valid for superstrings as well [40]. Let us first compute the correlator in the oscillator vacuum. We have The integral over the worldsheet coordinate gives [14] 2π 0 dσ 2π 2 sin σ 2 . (2.79) Expanding at small α ′ and plugging back into (2.77) reproduces the result from [14]. In general states, we can use the formula : e i q 1 · Xosc(σ,0) :: and then Taylor expand inside the normal ordering. In this way, commutativity is manifest. Let us now understand how commutativity is achieved in more detail. For concreteness, consider bosonic string theory and take a graviton as the external state It is sufficient to keep only the leading α ′ correction to see the effect: Inside the operator T ij , there are two types of terms. Firstly, the termsα 1 shuffle massless modes among each other. These by themselves lead to non-commutative shocks. However, crucially there are also terms −α −n which move the state across the string levels. In particular, the first term leads to mixing with the tachyon, whereas the second term produces higher level states. For example, the n = 1 term leads to mixing between the graviton and a spin-4 particle. As expected, the extra states restore commutativity.

A stringy equivalence principle
We have seen that in string theory, extra states restore commutativity of coincident shocks and satisfy the corresponding superconvergence sum rule. This phenomenon should occur in any gravitational theory with J 0 < 3, where J 0 is the Regge intercept. We give more details on the corresponding superconvergence sum rule in a generic tree-level theory of gravity in appendix A.3. In a non-tree-level theory with J 0 < 3, the superconvergence sum rule can receive contributions from loops or nonperturbative effects.
Reference [3] argued using causality that a tree-level theory of gravity should have J 0 ≤ 2. Their argument applies for scattering at nonzero impact parameter. As far as we are aware, there is currently no flat-space argument that the same should be true away from tree level, or at zero impact parameter. 16 However, as we will see in the next section, shock commutativity in AdS can be proved rigorously, nonperturbatively, and for all values of the impact parameter. This leads us to conjecture that the same is true in flat space and dS as well. More precisely, we propose Conjecture (Stringy equivalence principle). Coincident gravitational shocks commute in any nonperturbative theory of gravity in AdS, dS, or Minkowski spacetime.
We use the term "equivalence principle" because this is a modified version of the statement that all particles follow geodesics. The word "stringy" comes from the fact that mixing with stringy states can restore shock commutativity that would otherwise be lost.

Gravitational DIS and ANEC commutativity
We can easily repeat the same discussion in the context of a gapped QFT (or, more generally, a QFT that is free in the IR), where it becomes the statement about commutativity of the ANEC operators when evaluated in one-particle states. The virtual graviton g * of the previous sections couples to the QFT stress-energy tensor as h µν T µν and therefore the scattering process g * X → g * X is described by the following matrix element where as usual (2.85) describes the nontrivial part that multiplies δ (D) ( i p i ). 17 Unlike gravitational theories, there is no problem in defining off-shell observables in QFT and decoupling the probe that creates g * from the rest of the system (by considering the G N → 0 limit). One consequence of that is that we can formulate the problem using real momenta and keep g * off-shell. As before we will be interested in the following kinematics where q 2 1 and q 2 2 are non-zero. We chose the probe particle to be massless but it does not have to be the case. Mandelstam invariants take the form s = −(p 1 + p 2 ) 2 = ( q 1 + q 2 ) 2 and t = −(p 2 + p 3 ) 2 = p u p v . The matrix element that describes g * X → g * X is time-ordered as in the usual description of deep inelastic scattering. The discontinuities of the time-ordered matrix element A(s, t) are computed by the corresponding Wightman functions (2.87) Integrating the discontinuity over t we get where we used momentum conservation to rewrite the result in position space. We see that the integral over the discontinuity of A(s, t) is related to the commutator of ANEC operators inserted at the same time u = 0.
As in the scattering amplitude considerations of the previous section, causality considerations apply to the matrix element (2.85). In particular, for physical s we expect A(s, t) to obey The Regge boundedness condition (2.89) together with the usual assumptions about the analyticity of A(s, t) implies commutativity of ANECs in a gapped QFT via the superconvergence relations We conclude that for every q 1 , the commutator vanishes. This implies that coincident ANEC operators commute inside one-particle states.

Event shapes in CFT and shocks in AdS
In this section, we study shock commutativity and superconvergence sum rules in AdS, interpreting them in CFT language. We focus on shocks created by integrating a local CFT operator along a null line on the boundary of AdS. 18 The simplest example is a gravitational shock created by the average null energy (ANEC) operator E = dv T vv [14]. We will argue that ANEC operators on the same null plane commute, and this leads to nontrivial superconvergence sum rules that must be satisfied by CFT data. One of the nice properties of such superconvergence sum rules is that in large-N theories, they get contributions only from single-trace operators and non-minimal bulk couplings. We start by introducing null integrals ("light-transforms") of local operators in section 3.1. In section 3.2, we review "event shapes," which are certain matrix elements of light-transformed operators. In section 3.3, we compute some simple event shapes in AdS, emphasizing the similarities to our shock amplitude calculations in section 2.

Review: the light transform
We will be interested in integrals of a local CFT operator along a null line on the boundary of AdS. For example, let O µ 1 ···µ J be a traceless symmetric tensor, and consider the integral (3.1) 18 These backgrounds will already be rich enough to develop an analogy to the flat-space story. It would be interesting to study other types of shock backgrounds [52] in the future.
Here, we use lightcone coordinates (2.7), except we are now in d = D − 1 dimensions (so that y ∈ R d−2 ). In holographic theories, such operators create shocks in the bulk. An example is the average null energy (ANEC) operator The ANEC operator on the boundary creates a gravitational shock in the bulk of the form described by the AdS-Aichelburg-Sexl metric.
In the examples (3.1) and (3.2), the integration contour starts at the point (u, v, y) = (0, −∞, y) ∈ I − at past null infinity and ends at the point (u, v, y) = (0, ∞, y) ∈ I + at future null infinity. More generally, we can perform a conformal transformation to bring the initial point of the null integral to some generic point x. The result is an integral transform called the light-transform [20], Here, z ∈ R 1,d−1 is a null vector, ∆ and J are the dimension and spin of O, and we use index-free notation The light-transformed operator depends on an initial point x and a null direction z. The integration contour runs from x, along the z-direction, to the point in the next Poincare patch on the Lorentzian cylinder with the same Minkowski coordinates as x.
An advantage of this language is that it makes the conformal transformation properties of null-integrated operators manifest. L is a conformally-invariant integral transform that changes the quantum numbers as follows: In other words, L[O](x, z) transforms like a primary operator at x with dimension 1 − J and (non-integer) spin 1 − ∆. As we will see, this simplifies several computations involving these operators. We see from (

Review: event shapes
We will be interested in matrix-elements of light-transformed operators called "event shapes." First, consider a three-point function where we take φ 1 , φ 2 to be primary scalars for simplicity. This transforms like a threepoint function of primary operators, which is fixed by conformal invariance up to an overall constant. Without loss of generality, we can place x at spatial infinity x → ∞ so that the light-transform contour runs along I + , The factor transforms like a primary with dimension 1 − J.) We must act on the polarization vector z with the inversion matrix I µ ν (x) = δ µ ν − 2x µ xν x 2 , so that x 1 , x 2 and z transform in the same way under Lorentz transformations.
The operator L[O](∞, z) transforms like a primary inserted at spatial-infinity, which means it is annihilated by momentum generators Hence, the matrix element (3.7) is translationally-invariant, which is why we have written is really a translationally-invariant integral kernel that can be paired with test functions . This kernel can be diagonalized by going to momentum space. Let us define the Fourier-transformed states Positivity of energy implies that |φ(p) is nonvanishing only if p is inside the forward lightcone. The event shape is (3.10) We often abuse notation and write where it is understood that we have stripped off the momentum-conserving δ-function.
The physical interpretation of M(p) is that the state |φ 2 (p) acts like a source in Minkowski space. Excitations from the source fly out to I + , where they hit the "detector" Here, z specifies a particular direction on the celestial sphere S d−2 where the detector sits. Finally, we take the overlap of the resulting state with the sink φ 1 (p)|. The correlator (3.11) is called a one-point "event shape" because it involves a single detector.
More generally, we can consider an expectation value of multiple detectors at I + , i.e. a multi-point event shape (3.12) Multi-point event shapes involve a limit where the initial and final points of the light-transform contours become coincident. Specifically, all detector integration contours start at spatial infinity and end at future infinity. We discuss the conditions under which this limit is welldefined in section 4. Future null infinity I + is conformally equivalent to a null plane. For example, by performing a null inversion, we can bring I + to the plane u = 0 in the (u, v, y) coordinates In these coordinates, the event shape takes the form (3.14) Note that the sink and source states are still defined via a Fourier transform with respect to x. The transverse position of the detectors y i are related to the null vectors z i by stereographic projection, A consequence of writing the event shape in the form (3.14) is that it makes clear that the operators O i remain spacelike-separated along their integration contours. This ostensibly implies that detectors should commute. However, this argument ignores the fact that the operators become coincident at the ends of their integration contours. The question of commutativity of detectors is more subtle, and is related to singularities of four-point functions as we discuss in section 4.

Computing event shapes in the bulk
To understand the connection between event shapes and flat-space shock amplitudes, let us review how event shapes are computed in theories with a gravity dual [14,[53][54][55]. 19 For simplicity, we focus on energy detectors. A multi-point event shape for ANEC operators E is called an "energy correlator." We will see that commutativity of energy detectors is essentially equivalent to the coincident shock commutativity discussed in section 2.
To begin, let us separate the detectors in the u direction, so that L[O i ] is at position u i with the ordering u 1 > u 2 > · · · > u k . The insertion of an integrated stress tensor (3.17) To derive the proportionality constant in (3.17), recall that according to the standard AdS/CFT dictionary, the source for the stress tensor is encoded in the 1 z 2 deformation of the metric as z → 0. One can check that (3.17) A remarkable property of the metric (3.16) (or any superposition of such shock waves) is that it is an exact solution of Einstein's equations, even when arbitrary higher derivative corrections are included [56]. Here, we used the y = (u, v, y) coordinate system, which is related to the coordinate x by null inversion (3.13). The locus u = 0 is the Poincare horizon of the original Poincare patch in x. We define energy detectors as (3.18) In the y conformal frame, this is where y = n ⊥ 1+n d−1 and n = ( n ⊥ , n d−1 ). To compute energy correlators, we need to compute an overlap between states before and after propagation through a series of shocks where |Ψ ′ represents the state after propagation through shocks. We compute the amplitude with shocks at locations u 1 > u 2 > · · · > u k , and then subsequently take u i → 0. The energy correlator is given by We are interested in states dual to a single-trace operator insertion with momentum q: This corresponds to a single-particle state in the bulk. When the particle crosses a shock, it could lead to particle production. However, particle-production is suppressed to leading order in 1/c T . Instead, the leading-order effect is mixing with other single-particle states.
Mixing is captured by one-point energy correlators Ψ 2 |E( n 1 )|Ψ 1 . To understand these, let us study in more detail the propagation of fields on a shockwave background. This problem was considered in [14]. Consider a shock located at u = 0. Without loss of generality, we can set q = (q 0 , 0, . . . , 0). Using the bulk-to-boundary propagator, we can compute the wavefunction of the particle in the bulk. In the vicinity of u = 0, it takes the form In other words, the particle crosses u = 0 at a fixed transverse location. In (3.23) we used the AdS y-coordinates, see formula (3.3) in [14]. The location of the probe particle in the radial direction is related to its momentum. For q = 0 we get z = 1. Before and after the shockwave, the scalar field propagates freely. At the location of the shock, however, it changes discontinuously. The change is dictated by the equations of motions, which for the minimally coupled scalar field φ become where we only kept the terms that contribute to the δ(u) discontinuity. The matching condition for the field across the shock is obtained by integrating (3.24) over a small interval Plugging in the wavefunction (3.23), we see that (3.25) becomes multiplication by a phase Finally, using (3.21) and the expression (3.17) for h, we obtain This calculation is almost identical to our calculation of the amplitude for a minimally-coupled scalar to cross a shock in flat-space in section 2.3.1.
To compute a k-point energy correlator, we can imagine propagating the particle through a series of shocks at u 1 > u 2 > · · · > u k , and then taking the limit u i → 0. Alternatively, we can simply multiply a series of one-point correlators (3.27). Either way, the result is (3.28) The situation becomes more interesting if we include higher derivative corrections or mixing between fields. For example, consider a massless vector field A µ . As discussed in [3], the most general equation of motion for A µ takes the form where a 2 is a non-minimal coupling constant, and following [56] we introduced the effective shockwave Riemann tensor R µαβ The shockwave Riemann tensor is given by where l µ = ∂ µ u and K µν is symmetric, satisfies K µν l µ = 0, and takes the form Note that the indices i and j above go from i, j = 1, ..., d − 2, whereas the problem naturally has SO(d − 1) symmetry. It is easy to make this symmetry manifest by going from planar coordinates to the coordinates of [56]. Consider now a perturbation to the transverse components of A µ , where ǫ i are components of a polarization vector ǫ µ . The equations of motion become The solution is simply The formulas above are sufficient to compute a one-point energy correlator for the transverse components of A µ . However, note that the full problem additionally involves the component A z , which also gets excited by crossing the shock. As we remarked above, its behavior is fixed by SO(d − 1) symmetry. The effect of including A z should be to enlarge the transfer matrix H ij to include the (d − 1)'th component of n µ . The same formula (3.37) applies, except now the indices ij run from 1 to d − 1.
We can now compute the energy correlator (3.21). As before, let us order the shocks such that u 1 > u 2 > · · · > u k and then take the limit u i → 0. This introduces a corresponding ordering of the transfer matrices (3.37) that govern propagation through each shock. The result becomes where the subscript on the left-hand side indicates that we compute the event shape in the state created by ǫ · J µ , where J µ is a current. A commutator [E(n 1 ), E(n 2 )] is simply related to a commutator of transfer matrices [H 1 , H 2 ]. Let us next consider the case of gravity. As discussed in [3], the most general equation of motion for a gravitational perturbation takes the form As before, it is enough to consider transverse perturbations h ij and keep only the dependence on u and v (the rest is fixed by symmetries). Our equation becomes The form of the shockwave matrix H ij,mn is fixed by the symmetries of the problem to be 20 where H a = H( n a ) and we have introduced and used the known result for the one-point energy correlator. The solution of (3.41) is The energy correlator (3.21) again depends on the ordering of the shocks. We get In these calculations, it was crucial to compute the propagation amplitude through separated shocks u 1 > u 2 > · · · > u k , and afterwards take the limit u i → 0. This is guaranteed to produce the ordering of operators E( n 1 ) · · · E( n k ). By comparing different orderings, we can obtain interesting consistency conditions on the theory. A different procedure is to first set u i = 0, compute the propagation through a shock background h( y, z) = i h i ( y, z), and then take derivatives with respect to λ i . Instead of producing an ordered product of transfer matrices, this latter procedure produces a symmetrized product If we only have access to this object, we cannot study commutativity. However, one can study positivity of energy correlators [29].

Products of light-ray operators and commutativity
In this section, we discuss in detail the question of when light-transformed operators on the same null plane commute. We find that commutativity requires nontrivial conditions on the OPE limit, lightcone limit, and Regge limit of CFT four-point functions, all of which we verify in the case of ANEC operators in a nonperturbative CFT. In the Regge limit, we find that J 0 < 3 is a sufficient condition for commutativity of ANEC operators. One can argue using the light-ray OPE that it is also necessary [21]. Thus, superconvergence sum rules hold also in planar theories which do not satisfy nonperturbative bounds on the Regge limit, but do satisfy the bound on chaos J 0 ≤ 2 [26].

Existence vs. commutativity
Our goal is to answer two related questions: • Is a product of light-transforms at coincident points In event shapes we set x = ∞, but the answer is the same for any x, by conformal invariance. We take z 1 = z 2 , so that the integration contours for L[O 1 ] and L[O 2 ] are not identical. The coincident limit of light-transformed operators is defined by That is, we first perform the light-transforms starting from distinct x and y and then take the limit where the initial points x and y coincide. To determine when the operator (4.1) exists, let us study a matrix element between states created by operators (4.2) 21 The positions of O3, O4 can be smeared to create normalizable states, though that will not be important in our analysis.
If the above integral converges absolutely in the limit y → x, then we can commute the limit and the integral to obtain When (4.3) converges, commutativity is manifest because O 1 and O 2 remain spacelikeseparated everywhere in the region of integration. In the next subsection, we analyze in detail when (4.3) converges. However, it can happen that the integral (4.3) is not absolutely convergent. This does not necessarily mean that the product of operators (4.1) is ill-defined, but it does mean that commutativity can be lost. To understand how this works, let us start with the first line of (4.2) and use the fact that L[O 1 ] and L[O 2 ] annihilate the vacuum to rewrite the correlator in terms of a double-commutator: If this integral is absolutely convergent in the limit y → x, then we can commute the limit and integration to obtain which differs from (4.3) only in that the Wightman function has been replaced by a doublecommutator. The key point is that (4.5) might converge in a wider variety of situations than (4.3). This is because the double-commutator might be better behaved than the Wightman function in singular limits. Thus, the integral (4.5) gives a more general way to define matrix elements of the product (4.1), but it does not manifest commutativity. The fact that coincident light-transformed operators sometimes fail to commute has been noticed previously and is sometimes called "detector cross-talk" [15,57]. We give an example of this phenomenon in appendix B.
To summarize, • Absolute convergence of the double commutator integral (4.5) is a sufficient condition for the existence of • Absolute convergence of the Wightman function integral (4.3) is a sufficient condition for the existence of Figure 5: We consider the causal configuration where 4 > x and 3 < x + with 3 and 4 spacelike from each other. The points 1 and 2 are integrated over parallel null lines in the same null plane, with nonzero transverse separation. Here, we have suppressed the transverse direction, so the null plane appears as a single diagonal line (blue) from x to x + . (The conformal completion of the null plane also includes the left-moving diagonal line from x to ∞ on the left, and then from ∞ to x + on the right.) Note that when the integrals do not converge absolutely, it may be still possible to prescribe values to them, but these values may suffer from ambiguities in how the integral is computed. In the following sections, we discuss in detail when the above conditions hold.

Convergence of the Wightman function integral
Let us analyze in detail the conditions under which the Wightman function integral (4.3) is absolutely convergent. For now, we consider the causal configuration 4 > x, x + > 3, with 4 spacelike from 3, see figure 5. We comment on other configurations in section 4.4. The region of integration is shown in figure 6. Let us describe some of its important features. The dashed lines indicate when 1 and 2 become lightlike-separated from 3 and 4. When 1, 2 are spacelike from 4, or 1, 2 are spacelike from 3, we can rearrange the operators in the Wightman function so that 1 and 2 both act on the vacuum. The 1 × 2 OPE is guaranteed to converge in this case [58,59]. We refer to this region as the "first sheet" because the conformal cross-ratios need not move around branch points to get there.
Meanwhile, when 4 > 1 and 2 > 3, or 4 > 2 and 1 > 3, the 1 × 2 OPE is not guaranteed to converge. (Other OPEs do converge in these regions.) We call these regions the "second sheet" (shaded gray in figure 6) because the cross ratios must move around branch points to get there. The 1 × 2 OPE converges in the white region (the "first sheet"), but it does not necessarily converge in the gray-shaded region (the "second sheet"). The OPE limit is indicated in blue, the Regge limit in green, the lightcone limit on the first sheet (LC 1 ) in light red, and the lightcone limit on the second sheet (LC 2 ) in dark red.
We only need to analyze the convergence of the integral near the boundary of the integration region. Indeed, in the bulk of the integration region the only singularities are due to 1 or 2 becoming lightlike from 3 or 4. These singularities are removed by the iǫ prescriptions for operators O 3 and O 4 . After this we can split the integral into the near-boundary region and the bulk region. The bulk region is compact and free of singularities, so the convergence of the integral there is straightforward.
On the boundary of the region of integration, there are several types of singularities.
• Firstly, when α 1 , α 2 → ±∞ simultaneously, operators 1 and 2 become close on the first sheet. This singularity is described by the OPE.
• Another type of singularity occurs when either α 1 or α 2 approach ±∞, with the other variable held fixed. This is the lightcone limit, where 1 and 2 become lightlike separated. The lightcone limit on the first sheet is described by the 1 × 2 OPE, while the lightcone limit on the second sheet is not necessarily described by the 1 × 2 OPE. 22 Our strategy will be to first analyze the singularities on the first sheet. Then we will use Rindler positivity and the Cauchy-Schwarz inequality to bound singularities on the second sheet in terms of singularities on the first.

OPE limit on the first sheet
Let us begin by studying the OPE singularity on the first sheet. Without loss of generality, we take α 1 , α 2 → −∞. We can choose a conformal frame where O 1 and O 2 are both approaching the origin. For simplicity, consider a traceless symmetric tensor O 0 ∈ O 1 × O 2 with dimension ∆ 0 and spin J 0 . The contribution of O 0 in the OPE takes the form where m + n + k = J 0 and x 12 = x 12 /(x 2 12 ) 1/2 . The factors of x 2 12 come from dimensional analysis. The factors of z i · x 12 come from demanding the correct homogeneity in z i .
Let us make the change of variables Supplying the appropriate factors (−α i ) −∆ i −J i , the double light-transform of a single term above becomes where ". . . " indicates quantities independent of r and σ, and we have written ( Requiring that the r-integral converge near r = 0 gives the condition This is a consequence of dimensional analysis. To derive it more succinctly, recall that If O 0 appears in the OPE, then the coincident limit of L[O 1 ] 22 The lightcone limit on the second sheet has been conjectured to have an asymptotic expansion in terms of 1 × 2 OPE data [22]. We comment on the implications of this conjecture in section 4.2.6. and L[O 2 ] can only be finite if ∆ 0 > ∆ L 1 + ∆ L 2 , which is equivalent to (4.9). In particular, this shows that (4.9) must hold even if O 0 is not a traceless symmetric tensor.
Requiring that the σ-integral converge near σ = 0, 1 gives the conditions These conditions are strongest when n = k = 0, in which case they together become The conditions (4.9) and (4.11) can be weakened slightly using the special kinematics of the light transform. Because the polarization vectors z i are aligned with the positions where ∆ ′ 0 and τ ′ 0 are the smallest nonzero dimension and twist, respectively, appearing in the 1 × 2 OPE.
If the conditions (4.12) are satisfied, then the integral of each term in the OPE expansion over the region r < r 0 (for some sufficiently small r 0 , indicated by the quarter-disc in the lower-left or upper-right corner of figure 6) converges, including the boundaries where it probes the light-cone regime. Furthermore, the OPE expansion converges absolutely and exponentially in this region, and integrating each term doesn't change this -the convergence rate only improves as we approach the OPE or light-cone boundary. In other words, the sum of integrals of absolute values converges. The Fubini-Tonelli theorem then establishes absolute convergence of the Wightman function integral over the OPE corners in figure 6 under conditions (4.12).

Lightcone limit on the first sheet
We do not need to do additional work to analyze the lightcone limit on the first sheet. Studying the lightcone limit is equivalent to studying convergence of (4.8) when σ → 0, 1, with r held fixed. Because the r-dependence of the integrand (4.8) factors out from the σdependence, this again gives (4.11). We don't have to worry about the null-cone singularities when 1 or 2 become light-like from 3 or 4, because these are avoided by iǫ-prescriptions for the operators 3 and 4.
In this section we will re-derive (4.11) in a way that works when O 0 is not a traceless symmetric tensor. This approach will also be helpful for the discussion of the lightcone limit on the second sheet. (4.13) in the limit α 1 → −∞ with fixed α 2 . There exists a boost generator M such that e λM z 1 = e −λ z 1 and e λM z 2 = e λ z 2 . Let us define V (λ) = e −λD e λM , where D is the dilatation generator. Acting on the left-hand side of (4.13), we have

Consider the product
Acting on a single term on the right-hand side, we have where O 0 has eigenvalue τ 0 under D − M . Comparing both sides gives From a similar analysis with 1 ↔ 2, we recover (4.11). We will also need a simple generalization of this result. Consider the same OPE (4.13) but with more general polarization vectors, i.e.
For generic values of z ′ i , both operators will not be eigenstates of M , but instead contain a mixture of different eigenstates. Suppose we isolate eigenstates with eigenvalues m 1 and m 2 .
The corresponding piece of f 0 (α 1 ) will be then, by a straightforward generalization of the above argument When z ′ i = z i as above, we have m 1 = −J 1 and m 2 = J 2 , recovering (4.16). For generic z ′ i the dominant contribution to (4.18) will be determined by m 1 = −J 1 and m 2 = −J 2 , i.e.
On the other hand, in order for the stronger result (4.16) to be true it suffices to have ) and z ′ 2 has finite limit as α 1 → −∞. In other words, we can allow z ′ 2 to vary with α 1 . The condition on z ′ 1 implies m 1 = −J 1 , and is as good as generic z ′ 1 . To understand the condition on z ′ 2 , assume without any loss of generality that z 2 = (0, 1, 0) in (u, v, y) coordinates. This implies that where u z (α 1 ) and y z (α 1 ) are O(1) as α 1 → −∞. We have then where O 2,u...uv...vi 1 ...i l has n u-indices and k v-indices. If we were contracting with z 2 , we would only get v-indices, and m 2 = J 2 . Thus, v-indices carry positive charge under M , and we have m 2 = k − n = J 2 − l − 2n. We thus see that for non-zero n or l we depart from the optimal eigenvalue m 2 = J 2 . However, such terms are additionally suppressed by (−α 1 ) −n−l/2 . Combining these two effects with the help of (4. 19) we see that all terms contribute as (4.16).

Rindler positivity
Rindler positivity enables us to bound second-sheet correlators in terms of first-sheet correlators. Its implications are simple for four-point functions of scalar primaries, and this case has been analyzed previously in [19,22]. However, its implications are more subtle for spinning correlators, so let us discuss them in more detail. Any CFT has an anti-unitary symmetry J = CRT satisfying J 2 = 1. This symmetry acts on local operators as In this paper we will only study its action on traceless-symmetric operators, which reduces to where z = (−z 0 , −z 1 , z 2 , . . .).
The statement of Rindler positivity is [60] i −F Ω|JO 1 · · · O n JO 1 · · · O n |Ω ≥ 0, (4.25) where all operators O i lie in the right Rindler wedge x 1 > |x 0 |, and F is 1 if the number of fermions among O 1 · · · O n is odd and 0 otherwise. 23 This is a bit of an oversimplification, the general statement is that the operators O i should be smeared with test functions, and one can take arbitrary linear combinations of such smeared products. When the correlation function is well-defined as a number (rather than just as a distribution), the smearing is not necessary. Now let a and b be sums of products of (possibly smeared) operators contained in the right Rindler wedge, and define where F (a) ∈ {0, 1} is defined as above. Then we have where we used anti-unitarity of J, J|Ω = |Ω , and the fact that JbJ and a are space-like separated. Rindler positivity implies (a, a) > 0. Thus, (·, ·) is a Hermitian inner product and we have the Cauchy-Schwarz inequality Let us develop more conformally-invariant versions of these statements. The geometry that defines J is given by the codimension-1 planes x 0 ±x 1 = 0. These planes can be described more invariantly as the past null cones of points A and B at future null infinity. Given these points, the right Rindler wedge is given by B > x > A − and the left is given by A > x > B − . 24 In general, for any spacelike-separated pair of points A and B, there exists an anti-unitary Rindler conjugation J AB that depends on these two points and exchanges the two wedges. A positive-definite Hermitian inner product analogous to (4.26) can be defined for each J AB .
It is convenient to describe the action of J AB using the embedding formalism. Let X A and X B be the embedding space coordinates of A and B. Then J AB acts on spacetime as a Euclidean rotation by π in the plane spanned by X A and X B . It can be written as 25 The action of J AB on local operators is then Consider now a configuration of points 1, 2, 3, 4 with the causal relationships 4 > 1 and 2 > 3, where all other pairs of points are spacelike-separated. We can find a conformal transformation that brings these points into a configuration where the pair 1, 2 and the pair 24 Here, A − (B − ) represents the point obtained by sending lightrays in all past directions from A (B) and finding the point where they converge. See, e.g. [20] for details. 25 We abuse notation and write JAB for both the anti-unitary operator on Hilbert space and a linear transformation in the embedding space.
3, 4 are each symmetric with respect to the standard Rindler reflection J. Thus, there must exist A, B such that J AB maps 1 ↔ 2 and 3 ↔ 4. In embedding-space language, where we must introduce scaling factors λ ij because the X's are projective coordinates. Here, X A , X B and the coefficients λ 12 , λ 34 are all functions of X 1 , . . . , X 4 . These functions are somewhat complicated in general, but we will only need them in certain limits. In our configuration, the Wightman correlator is on the second sheet. However, we can write and use the Rindler Cauchy-Schwarz inequality to write Let us focus on the first factor (the second can be treated equivalently) Here we use the notation O = J AB OJ −1 AB . Note that O 2 is inserted at X 1 and O 3 is inserted at X 4 . Both of these points are in the opposite Rindler wedge from X 2 and X 3 . This implies that O 3 and O 2 commute so we can reorder the operators to obtain (4.36) The operators O 2 at X 1 and O 2 at X 2 now act on the vacuum. By the results of [58,59], the correlator (4.36) is on the first sheet and we can use the OPE to control its behavior, and hence bound the left-hand side of (4.34). It is convenient that the correlators in the right hand side of (4.34) have the same insertion points X i as the correlator in the left hand side, and hence the same cross-ratios. Let us briefly mention how one can determine the points X A and X B as functions of X i . To do this, we must solve the equations (4.31), together with the conditions Using (4.29) we can see that X A and X B must have the form for some coefficients c ai . We have 2 scalar equations coming from each equation in (4.31), by projecting on X 1 − λ 12 X 2 or X 3 − λ 34 X 4 . We also have 2 scalar equations (4.37), which adds up to 6 equations for 6 unknowns c ai , λ ij . It is easy to solve these equations by making use of the conformal symmetry. For this, recall that all coordinates X are projective, and hence our unknowns c ai and λ ij have nontrivial projective weights as well. We can construct combinations such as which are projective invariants. Projective invariants are the same as conformal invariants, and thus must be expressible in terms of cross-ratios, i.e.
The function f A1 (z, z) can be computed by using the expressions for X i , X A , X B for the standard Rindler reflection J. As soon as we know the function f A1 (z, z), we can find Note that the expression in the right-hand side depends on X A but only through an overall coefficient (X A · X 2 ). The same coefficient can be factored out from c A3 , and thus X A is determined up to an overall rescaling, which is irrelevant. For example, we can simply set (X A · X 2 ) = 1 to get a concrete solution. All the other coefficients c ai and λ ij can be determined in the same way. We will not need the complete solution, but it is helpful for explicitly checking our arguments.

Regge limit
In the previous section we saw that Rindler positivity implies a bound on the correlator of the form We can now use the first-sheet bounds from sections 4.2.1 and 4.2.2 to bound the correlators in the right hand side. Before doing so, let us write out these correlators a bit more carefully. For example, where Writing this in terms of real space operators, we find Let's focus on the Regge limit when α 1 → +∞ and α 2 → −∞. Similarly to section 4.2.1 it is convenient to work in the frame in which x 2 is approaching the origin, In this frame point x 1 is in the next Poincare patch, so it is convenient to work with x − 1 , which is the image of x 1 in the Poincare patch of x 2 , In fact, since O(x 1 ) acts on future vacuum, the correlator (4.45) changes only by a constant phase upon replacement x 1 → x − 1 , so the analysis is very similar to section 4.2.1. If the coefficients λ ij and polarization vectors z 2 and z 3 were constant, we could simply reuse the results of that section.
Instead, since λ ij and z 2 , z 3 depend on the x i (and thus α 1 , α 2 ), we need to analyze their behavior in the limit α 1 → +∞ and α 2 → −∞. Let us parameterize, similarly to (4.7) We have checked that in the limit r → 0 the coefficients λ 12 , λ 34 stay finite and where z ′ 3 is finite and depends on z 3 and relative positions of 0, x 3 , x 4 . This means that in Regge limit at fixed σ (4.45) is bounded in absolute value by for some constant C > 0. The same analysis applies to the second correlator in (4.42).
We can now reuse the arguments of section 4.2.1 to conclude that the Wightman function integral converges near Regge limit, at fixed σ, provided that where ∆ vac is the smallest scaling dimension appearing in the 2 always contains the identity operator, ∆ vac = 0. Note that the polarizations of both O 2 and O † 2 are z 2 , so we cannot exclude the identity contribution using kinematics as we did in section 4.2.1. So we finally obtain the sufficient condition (4.52) Note that we have only shown that this is sufficient for convergence of the integral at fixed σ. We will discuss the case of σ approaching the light-cone boundaries, and thus of the two-variable integral, in the next section. This latter condition is sufficient, but may turn out to be not necessary, since we cannot prove that the Cauchy-Schwartz bound (4.42) is tight. To allow for this possibility, let us introduce a parameter J 0 which is defined as the smallest real number such that where all polarization vectors in denominator are generic. Then the Wightman function integral converges in the Regge limit if The result of this section can in turn be summarized by saying that

Lightcone limit on the second sheet
Consider now the lightcone limit on the second sheet. The situation is very similar to the Regge regime, and we can use the same frame as above to analyze it. The only difference is that now we consider either α 1 → +∞ or α 2 → −∞, corresponding to the right or the lower boundary in figure 6 respectively. The other two boundaries can be treated in the same way. For concreteness, let us focus on the limit α 2 → −∞. For simplicity, we will assume that x 1 is in the same Poincare patch as x 2 , and write This time, we will have to discuss both correlators in the bound (4.42). Modulo factors of λ ij in (4.45) and their analogues for the second correlator in (4.42), we are essentially interested in the behavior of the OPEs The λ-factors and polarizations entering O 3 and O 4 , similarly to Regge limit, can be ignored because they all tend to some generic finite limits. 26 It is easy to determine the direction of z 1 in the strict lightcone limit α 2 = −∞. In this limit we find x 2 = 0, and z 1 lies along the unique null ray which connects x 1 and x 2 . This is a conformally-invariant statement, and so it should hold after we apply J AB . Applying J AB sends x 1 to x 2 , x 2 to x 1 , and z 1 to z 1 . Therefore, we find that z 1 should also lie along the unique null ray connecting x 1 and x 2 , and thus it is proportional to z 1 . In other words, for some finite c. This implies that Using the results of section 4.2.2 (after swapping α 1 and α 2 ) we find that Here τ 0 is the smallest twist that appears in O † 1 O 1 OPE. There is nothing special we can say about z 2 , and it simply tends to some generic finite value in the limit. This and results of section 4.2.2 imply that Combining these results we find that near the second sheet null-cone limit α 2 → −∞. This bound is uniform away from the boundary between first and second sheets 1 ∼ 3. Including the light-transform weight (−α 2 ) −∆ 2 −J 2 , we conclude that the Wightman function integral converges absolutely in this region provided Combining this with the condition from α 1 → ±∞ we find the sufficient condition Similarly to the Regge limit, we cannot exclude the identity operator from O † 1 O 1 (or O † 2 O 2 ) OPE, and so we must set τ 0 = 0, resulting in the final condition There are two subtleties which we still need to address. One is the convergence of the two-variable integral near Regge limit -we have established the convergence of the radial integral in the previous section and of the angular integral in this section, but we have not yet proved that the double integral converges. To see that it does, note that we have succeeded in bounding both the Regge limit and the second-sheet lightcone limit by using Rindler positivity. A closer look at our arguments shows that for r < r 0 and all σ the secondsheet correlator is bounded in absolute value by the product of first-sheet correlators times a uniform constant C ′′ , where r 0 is sufficiently small. Convergence of the double-integral of the product of first-sheet correlators can be established by the same methods as in section 4.2.4, and then the convergence of the integral on the second sheet follows immediately.
The second subtlety is that near the lightcone limit, either on the first or on the second sheet, we have only established the convergence of the integral provided we exclude a region near the boundary between the first and the second sheets. We now turn to a discussion of this subtlety.

Asymptotic ligthcone expansion
The discussion of previous sections provides us with rigorous bounds on the growth of the Wightman function on the first and the second sheets. The situation is, however, qualitatively different for the two sheets.
On the first sheet we have a tight bound on the growth in OPE and lightcone limitswe can use the convergent OPE expansion to see that the Wightman function actually does saturate the bound. This means that unless the conditions of sections 4.2.1 and 4.2.2 are satisfied, the Wightman function integral diverges (in absolute value sense).
On the second sheet we have a potentially non-optimal bound on the growth in the Regge and lightcone limits, which we derived from the Cauchy-Schwarz inequality for Rindler reflection positivity. We have already pointed out that the growth in the Regge limit may be weaker than the bound, and we parametrized the true growth by an exponent J 0 . Similarly, there is no a-priori reason to believe that the lightcone bound is tight.
In fact, there is a reason to believe that the growth of the correlator on the second sheet is the same as on the first sheet. Indeed, it is natural to expect that the lightcone OPE expansion, even though not convergent on the second sheet, is still valid asymptotically [22]. Schematically, where m i is the boost eigenvalue of O i (similarly to section 4.2.2), and the sum on the right-hand side is over spin components of primaries and descendants. Such an asymptotic expansion is sufficient to establish the growth rate of the Wightman function near the second sheet lightcone limit, and gives the same results as on the first sheet. In particular, the contribution of the identity operator to (4.66) is excluded in the same way as on the first sheet, and we don't have to set τ 0 = 0 in (4.64) anymore. By using the asymptotic expansion (4.66), we can also prove that the Wightman function integral converges in the ligthcone limit near the boundary between the first and the second sheets (with iǫ prescription employed for O 3 , O 4 ), closing the loophole in our arguments.
While (4.66) is a natural expectation, we don't have a general proof that it holds. An argument was given in [22] for the case of scalar responsible for reproducing a finite number of terms in (4.66) in the s-channel. The second part I 2 contains all other contributions. By going sufficiently far into the s-channel lightcone regime, one can make sure that I 1 completely dominates over I 2 . Continuation of I 1 to the second sheet is straightforward, since it is simply equal to a finite number of terms in (4.66). The expansion (4.66) then follows if we can show that the second part I 2 remains subleading on the second sheet. In the setup of [22] this is easy to show, since all the terms in the t-channel channel expansion (and hence in I 2 ) are positive, and continuation to the second sheet merely adds some phases, which cannot increase the total sum. This last step is problematic in more general setups. The positivity of t-channel contributions is due to the fact that the Wightman function considered in [22] is Rindler-reflection positive on the first sheet. As soon as we consider non-identical operators or operators with spin, the Rindler-reflection positivity ceases to be generic, and the argument cannot be applied. Still, the fact that (4.66) is valid at least for scalar O 1 = O 2 , in some states, strongly suggests that it can be valid more generally. For the argument of [22] to fail in the general setup, it must be that the phases in I 2 on the first sheet conspire to give an abnormally small result for all values of z, which seems rather unlikely.
In view of this discussion, we will assume that the asymptotic expansion (4.66) holds.

Convergence of the double commutator integral
We now consider the convergence of the double commutator integral (4.5). The first observation is that the double-commutator vanishes when 1 ≈ 4 or 3 ≈ 2. Our integration region is therefore restricted to 1 < 4 and 2 > 3. This corresponds to the upper left shaded square in figure 6. Note, however, that it is now meaningless to say that in this square the correlator is on the second sheet. The double-commutator consists of four Wightman correlators, and it is not guaranteed that they all have cross-ratios on the same sheet. In fact, by using the causal relations between the points and microcausality, we can write in this region The first two Wightman functions are of the same type as studied in the previous subsection, and are on the second sheet. In the last two Wightman functions, the operators O 1 and O 2 act on the vacuum, and so they are on the first sheet. When analyzing the convergence of the double commutator integral (4.5), once again we only need to consider the convergence near the boundaries of the integration region. Furthermore, we don't have to worry about the boundaries where 2 ∼ 3 or 1 ∼ 4. Near these boundaries the integral is defined by iǫ prescriptions. More precisely, the double-commutator is obtained by folding integration contours in the Wightman function integral (4.3). Due to this folding, near these boundaries the four terms above pair up and form integration contours similar to figure 7. Overall, the double-commutator integral is an integral of a single Wightman function over a folded complex integration cycle in cross-ratio space. Everywhere away from the 2 ∼ 3 or 1 ∼ 4 boundaries this integration cycle can be split into four layers, and each layer can be interpreted as an integral of a Wightman function from (4.67). Near these boundaries the layers merge, wrapping around branch cuts and providing a canonical regularization of the integral.
We will continue to refer to the remaining two boundaries as the lightcone limits, and to the corner where they meet as the Regge limit. We can use the methods of the previous subsections to bound the growth of each Wightman function in (4.67) in these limits and find that the same conditions as we derived for (4.3) are also sufficient for convergence of the double-commutator integral. 27 It is easy to see that weaker conditions are in fact sufficient for convergence of the doublecommutator integral. Let us consider the lightcone limits first. The asymptotic expansion (4.66) essentially implies that we can approximate the double-commutator near these limits by replacing the Wightman functions in (4.67) by a finite number of s-channel conformal blocks for the leading twist operators, analytically continued to the desired Wightman orderings. However, it is known [19] that s-channel conformal blocks cancel in the combination (4.67). This means that the asymptotic expansion (4.66) does not contribute to the double-commutator. If this expansion were valid for any τ , then we would conclude that the double-commutator decays faster than any power of α 1 or α 2 near the lightcone limits.
However, it is well-known that the spectrum of primaries in any OPE has accumulation points in twist [50,61]. Let us denote the first twist accumulation point in O 1 × O 2 OPE by τ * . We can only trust (4.66) for τ < τ * , since for τ ≥ τ * we would have to include infinitely many terms in the sum. Therefore, we can only conclude that the double-commutator grows in the ligthcone regime no faster than at the rate determined by τ * . This leads to the following sufficient condition, (4.68) For the Regge limit, we don't have an analogue of (4.66), and we will simply introduce a growth exponent J dDisc 0 for the double-commutator in complete analogy with (4.53). The 27 With the same caveat about the boundary between the first and second sheets as before.
condition for absolute convergence of the double-commutator integral near the Regge limit is then (4.69) Similarly to J 0 , Rindler positivity bounds imply are the smallest non-zero twist and dimension that appear in O 1 × O 2 OPE, 28 J 0 is the growth exponent in the Regge limit, defined by (4.53). Using Rindler positivity, we have shown that We have also shown that the above conditions with τ ′ 0 = 0 are sufficient even if we don't assume the asymptotic expansion (4.66), but simply use bounds from Rindler positivity. 29 For the double-commutator integral (4.5) we have shown that the sufficient conditions for its convergence are (4.74) where τ * ≥ d − 2 is the first twist accumulation point in the O 1 × O 2 OPE, and J dDisc 0 parameterizes the Regge growth of the double-commutator in the same way that J 0 parametrizes growth of the Wightman function through (4.53). Note that we have (4.76) 28 When J1 = J2 = 0, then we cannot exclude unit operator contributions in the OPE and lightcone limits and ∆ ′ 0 and τ ′ 0 should be the lowest dimension and twist in their respective OPEs. In other words, they do not have to be nonzero in that case. 29 This argument has a small loophole discussed, e.g., in section 4.2.6.
Let us briefly discuss the values of J 0 and J dDisc 0 . First of all, from the expansion (4.67) if follows that Both J 0 and J dDisc 0 can be studied using conformal Regge theory [17][18][19][20]. Conformal Regge theory implies that the Regge limit of the four-point function behaves as 1 + r 1−j(0) , where j(0) is the spin of the leading Regge trajectory at dimension ∆ = d/2. Here, the 1 comes from the identity operator in the O 1 × O 2 OPE. However, we are considering special kinematics where the unit operator does not contribute unless O 1 and O 2 are identical scalars. In these special kinematics, the four-point function behaves as r 1−j(0) . The double-discontinuity also does not get a contribution from the unit operator, so it also behaves as r 1−j(0) . Thus, we expect Let us consider the case where O 1 , O 2 are not identical scalars. Note that if J 0 is the intercept of the stress-tensor trajectory, then 1 ≥ J 0 ≥ 2 − d 2 by Nachtmann's theorem [8,50,62]. Furthermore, by unitarity 1 (4.79) In our analysis so far, we have considered the causal configuration 4 > x and x + > 3 with 3 and 4 spacelike. One can use O 3 , O 4 to generate a dense subspace of the Hilbert space while staying in this causal configuration. Thus, our analysis establishes well-definedness and commutativity (when applicable) acting on this dense subspace. However, this subspace does not include important states like momentum eigenstates, so we might hope to establish commutativity on a larger dense subspace.
Some different causal configurations can be reached by acting on O 3 and O 4 with the operator T that translates operators to image points in other Poincare patches (see [20] for details). This operation simply introduces a phase, leaving our analysis unchanged. However, there exist other causal configurations that cannot be reached in this way: for example, if 4 > x and x + > 3 but 3 and 4 are timelike. In this case, one cannot use Rindler positivity and we have not found a way to rigorously bound the behavior of the correlator. We can argue non-rigorously as follows. To establish convergence of the Wightman or double-discontinuity integrals in the lightcone limit, we can invoke the asymptotic lightcone assumption of [22] described in section 4.2.6. Under this assumption, the analysis of the lightcone limit between 1 and 2 is independent of the positions of points 3 and 4, and thus identical to what we have already done. To establish convergence in the Regge limit, we can assume that the leading Regge behavior r 1−J 0 predicted by conformal Regge theory continues to hold in this different causal configuration. It is possible that conformal Regge theory can be used to rigorously establish both of these assumptions. We leave this problem for future work.
In upcoming work [21], we establish a connection between commutativity and the Regge limit in a different way. We show that the commutator

Convergence in perturbation theory
Although we have shown that J 0 ≤ 1 in a nonperturbative theory, this bound can be violated at a fixed order in perturbation theory (e.g. either weak coupling or large-N ). In large-N perturbation theory, the bound on chaos [26] implies that where J planar 0 characterizes the Regge growth of the leading nontrivial term in 1/N . This bound is saturated in holographic theories with a large gap, where J planar 0 = 2 comes from tree-level graviton exchange in the bulk. At one-loop in the bulk, we get contributions from two graviton exchange, and hence we expect J 1-loop 0 = 3 in such theories. From our discussion above, it follows that in these theories, products of ANEC operators on the same null plane should be well-defined and the ANEC operators should commute at planar level, since J planar 0 < 3. However, a product of ANEC operators on the same null plane is not well-defined at 1-loop order in the bulk, since J 1-loop 0 ≥ 3. To recover a well-defined observable, one would have to appropriately re-sum 1/N effects.
Note that there is no contradiction with the product of ANEC operators being defined nonperturbatively. The four-point function has an expansion in powers of 1/N at fixed values of cross-ratios, but this expansion does not commute with taking the Regge limit. Since the ANEC integrals probe the Regge limit, we find that these integrals do not commute with 1/N expansion. Let us consider a (not necessarily physical) toy model of such a situation, (4.81) Here the I N is the analogue of the energy-energy correlator, and f N (s) is the analogue of the correlation function, where s → +∞ is the Regge limit. The integral for I N converges if f N (s) grows as s j−1 with j < 3. This is true for N 0 and N −2 terms in large-N expansion of f N (s), which predict The exact answer is We see that term-wise integration is reliable for the orders at which it converges, but at higher orders I N may cease to have a simple 1/N expansion. We expect that something similar happens in energy-energy correlators, i.e. E(n 1 )E(n 2 ) ǫ·T = f planar (n 1 , n 2 ) + f rest N (n 1 , n 2 ), (4.84) where f planar (n 1 , n 2 ) is computed from the planar part of the four-point function, and lim N →∞ f rest N (n 1 , n 2 ) = 0, (4.85) but f rest N (n 1 , n 2 ) does not admit an expansion in integer powers of 1/N 2 . In N = 4 SYM, at finite t'Hooft coupling J 0 will be less than 2 and (4.84) may have more 1/N terms in it.

Other types of null-integrated operators
The works [24,25] have considered other examples of operators integrated over null rays. For example, the authors of [24] introduced (4.86) They studied the algebra of such operators (under the assumption that products can be suitably renormalized) and found that it resembles a separate Virasoro algebra for each transverse point y. The work [25] did a similar analysis of other null integrated operators and found that they generate a BMS algebra. A first comment about the expression (4.86) is that it is generically divergent in any correlation function when n ≥ d. A definition of L n ( y) which is not divergent is in terms of descendants of L[T ], as we explain below.
Let us make two additional comments about such operators. Firstly, the additional insertions of v n+1 in the integrand make it more difficult to argue that products are welldefined and commutative, even at nonzero transverse separation y 12 . The required analysis is similar to the previous subsections. For example, suppose we would like to establish that O 4 |[L n ( y 1 ), L m ( y 2 )]|O 3 = 0, for nonzero y 12 . The integral of the Wightman function Ω|O 4 T T O 3 |Ω is absolutely convergent in the Regge limit if where J 1 = J 2 = 2. If 0 < J 0 < 1 (as expected for the 3d Ising model), we can only prove commutativity at nonzero y 12 for the cases n + m ≤ 0. If J 0 = 1 (as expected in a gauge theory), we can only prove commutativity at nonzero y 12 for n + m < 0. One should also consider constraints coming from the lightcone limit. A full analysis of well-definedness and commutativity of more general light-ray operators is outside the scope of this work. If Wightman function integrals are not absolutely convergent, then it may still be possible to renormalize products L n ( y 1 )L m ( y 2 ), but the renormalized product may not be commutative at nonzero y 12 .
This discussion assumes that the Regge limit is always dominated by a fixed Regge intercept J 0 . More generally, the Regge limit of a four-point function is related to an integral over the leading Regge trajectory J 0 (ν) where ∆ = d 2 + iν and ν ranges from −∞ to ∞ [17,18]. The J 0 we have discussed so far is shorthand for the maximum value of J 0 (ν) along this trajectory. However, it is possible to isolate different values of ν by performing an integral transform in the transverse positions y. Thus, perhaps by passing to ν space and choosing a ν such that J 0 (ν) is sufficiently small, one could alleviate the problems with defining products L n ( y 1 )L m ( y 2 ). We briefly discuss this possibility again in section 6.2.2.
Finally, let us explain how operators like L n ( y) and those in [25] can be described using light transforms. The significance of L[T ](x, z) is that it transforms like a conformal primary. By contrast, the operators L n ( y) can be understood as descendants of L[T ](x, z) -i.e. derivatives with respect to x and/or z. As usual in conformal field theory, correlators of descendants are determined by correlators of primaries.
Let us understand how this works for the case of where is an embedding-space vector. Here, T (X, Z) is the embedding-space lift of T µν (x), which we describe in more detail in section 5.1.3. We recover vT vv by setting m = v and Z = (0, 0, z) The product X m T (X, Z) is an example of acting on T (X, Z) with a weight-shifting operator [63]. Weight-shifting operators are conformally-covariant differential operators that shift the conformal weights of the objects they act on, in addition to introducing a free index for a finite-dimensional representation W of the conformal group. In this case, X m is a 0-th order differential operator (since it does not involve any derivatives ∂ ∂X or ∂ ∂Z ). It transforms in the vector representation W = of SO(d, 2), and shifts weights by (∆, J) → (∆ − 1, J).  The representation W = possesses other weight-shifting operators that will not be important for our discussion. Weight-shifting operators and conformally-invariant integral transforms satisfy a natural algebra. We can guess the form of this algebra simply by inspecting quantum numbers. For example, because L : In particular, The operator L 0 ( y) can be obtained by appropriately specializing X, Z above: . (4.95)

Computing event shapes using the OPE
In this section, we discuss how event shapes can be computed using the OPE of the boundary CFT. Simple examples of event shapes have been computed before, for example in [14]. Our goal here is to provide tools for the calculation of n-point event shapes with general intermediate and external operators. For the purposes of this paper, this will allow us to match the results of section 3 and to express the commutativity of shocks as an exact constraint on the CFT data. We will also compute in closed form all conformal blocks which appear in scalar two-point event shapes, which may be useful in other contexts. 30 In general, a conformally-invariant integral transform Ir is associated to a Weyl reflection r of SO(d, 2) [20,64,65]. When we commute a weight-shifting operator D w with weight w past the integral transform, the weight gets reflected, IrD w = D r(w) Ir.
Let us give a brief overview of this section. We start by introducing the appropriate conformal blocks in section 5.1. After describing their general structure, we explain in detail how they can be computed in subsections 5.1.2-5.1.4. We then use these results in section 5.2 to match the bulk results of section 3, and in section 5.3 to describe the constraints that shock commutativity implies for the CFT data. Finally, in section 5.4 we give a closed-form expression for a general conformal block appearing in a scalar event shape, and demonstrate how the expansion works in a simple generalized free theory example.

t-channel blocks
Let us consider a general two-point event shape 31 We can rewrite it by inserting a complete set of states between the two light transforms where we sum over primary operators O, and Ψ O run over an orthonormal basis of descendants of O. Let us focus on the contribution of a single primary operator O. To simplify the discussion, assume for the moment that O is a scalar. Normally, the sum over the descendants Ψ O is rather complicated, since in order to perform it one needs to find an orthonormal basis in the conformal multiplet of O. A simple, although a bit unorthodox, way to perform such orthogonalization is to consider the momentum eigenstates Since we are in Lorentzian signature, these states are perfectly well-defined (as distributions in p). They also form an orthogonal set thanks to momentum conservation, 31 We are suppressing possible Lorentz indices for O3 and O4.
where the right hand side is completely fixed by momentum conservation, Lorentz and scale invariance, and energy positivity, up to an overall factor A(∆). Here, ∆ is the scaling dimension of O. Moreover, these states are complete in the conformal multiplet of O, which follows from completeness of the states where f ranges over Schwartz test functions [66]. Using this basis, we can write 32 This expression is morally equivalent to the shadow integral approach to conformal blocks (see, e.g. [68]). An important difference is, however, that (5.7) is a rigorous identity in the Hilbert space, and does not require subtraction of any analogs of shadow contributions. Expression (5.7) turns out to be perfectly suited for our needs. Indeed, in (5.3) we are taking an inner product of Ψ O | with momentum eigenstates, 33 and this localizes the p integral in (5.7). Using this observation, we find the following expression for the contribution of O to the event shape (5.1) Above, we used momentum conservation to go to the last line, and as usual abused the notation by implicitly removing the momentum-conserving δ-functions in the final three-point functions.
Let us immediately note an important feature of this expansion. Since the light transform L[O 2 ] annihilates the vacuum, we can write (5.9) which, analogously to the situation with t-channel conformal blocks in the Lorentzian inversion formula [19], implies that the contribution of O vanishes if it is a double-trace of In (5.8) we have essentially computed (in the case of scalar O) what we will call the tchannel conformal blocks for the event shape (5.1). We use the name t-channel block, because we would like to reserve s-channel to mean the OPE of L[O 1 ]L[O 2 ], which we discuss in [21]. 32 A version of (5.7) with spin was recently used in [67]. We will use a slightly different generalization to spin. 33 Recall that L[Oi] is inserted at infinity and is thus translationally-invariant 34 The reason is that in this case the commutator vanishes, as can be checked from explicit expressions for the three-point tensor structures.
More precisely, the conformal block corresponding to (5.8) is given by stripping off the OPE coefficients, where we used a superscript (a) to indicate that we are working not with a physical three-point function, but with a standard conformally-invariant three-point tensor structure with label a. We will denote the conformal block for exchange of O in a general Lorentz representation Again, the indices of operators O 3 and O 4 are implicit in this notation. The analog of (5.10) for these more general blocks is a bit more involved, owing to the spin indices of O, and we defer its discussion to section 5.1.2.
With this notation, the event shape can be written as where ρ O is the Lorentz representation of O, and λ are the OPE coefficients dual to the chosen basis of three-point tensor structures. In principle, the t-channel event shape conformal blocks can be computed from the usual conformal blocks by applying the light-and Fourier transforms. However, the kinematics of event shapes are very special, and it is easier to directly use (5.10). In particular, as we will soon see, any t-channel event shape block can be written in terms of simple functions, which is not true for general conformal blocks.
To conclude this section, let us note that the above discussion can be straightforwardly generalized to multi-point event shapes such as We can insert a complete set of states in between each consecutive pair of light transforms. 35 The conformal block is obtained by restricting the sums over states to conformal multiplets of some primary operators O ′ i , Assuming again that all operators O ′ 2 , . . . , O ′ n are scalars, and repeating the arguments leading to (5.10), we obtain the expression for the conformal block G t,a 2 ···an 35 Multi-point conformal blocks of this topology are sometimes called "comb-channel" blocks.
Again, the generalization to O ′ i of non-trivial spin is straightforward.

Convergence of t-channel expansion
In this section, we discuss in more detail the convergence of the expansion (5.2). On general grounds, (5.2) converges absolutely if the states |Ω have finite norm. This is certainly not the case since we have, for example, which is in general divergent because of the momentum-conserving delta-function and the fact that polarizations of the two detectors are the same. We can try to avoid both problems by considering the smeared ket state Smearing over z 2 is in principle equivalent to smearing of O 2 , but the effective smearing function is not a test function -it only has support on future null infinity, which is codimension 1. Whether this smearing yields a finite-norm state is not obvious. One instance in which it does is given by uniform smearing of z 2 over the celestial sphere and O 2 = T . In this case we get which is finite-norm. There are several other smearing functions which give different components of momentum generator P µ . 36 More generally, we can also smear the coordinate x 2 of L[O 2 ](x 2 , z 2 ), which yields a finite-norm state and thus a convergent expansion (5.2), and take the limit of localized x 2 = ∞. If the event shape is well-defined in the first place (c.f. discussion in section 4), then one can expect this limit to commute with the expansion (5.2), thus showing that smearing in polarization vectors is sufficient. 37 In [21], we will relate the event shape (5.1) to the Lorentzian OPE inversion formula at spin J 1 + J 2 − 1, with ∆ = d 2 + iν on the principal series. In particular, the coefficient function C(∆ = d 2 + iν, J = J 1 + J 2 − 1) appearing in the inversion formula is equal to a smearing 36 For O2 = T a general smearing over z2 can be interpreted as a difference of modular Hamiltonians for two particular spatial regions [24]. It is unclear whether d d pf (p)|O3(p) is in the domain of this difference. We thank Nima Lashkari for discussion on this point. 37 Additional smearing in p should not be important since the dependence of event shape on p is essentially fixed by Lorentz invariance.
of the two-point event shape with a particular test function that depends on ν. In ν-space, the question of convergence of the OPE expansion (5.2) is thus equivalent to the question of convergence of the conventional t-channel conformal block expansion, when inserted into the Lorentzian inversion formula. Thus, the t-channel expansion for the event shape converges in ν-space if J 1 + J 2 − 1 > J 0 [19,69]. This is equivalent to the condition for the event shape to make sense in the first place.
In what follows we will mostly be interested in event shapes in the space of spherical harmonics, as opposed to ν-space. We will study in section 5.4.4 a simple example in which (5.2) converges after smearing with a test function, provided this test function vanishes sufficiently quickly near the collinear limit z 1 ∝ z 2 . Smearing with spherical harmonics does not have this property, but it can be achieved by taking appropriate finite linear combinations. The number of such "subtractions" in the example of section 5.4.4 depends on the scaling dimensions of O 1 and O 2 . We will take it as an assumption that this is the general picture. Furthermore, we will assume that no subtractions are necessary if O 1 = O 2 = T , and smearing polarizations z 1 and z 2 against spherical harmonics already leads to a convergent expansion (5.2). 38 It would be interesting to examine this question more rigorously.
Let us finally comment on a related subtlety. In the preceding discussion we showed that the ANEC operators commute in the sense for non-collinear z 1 and z 2 . In principle we have not excluded the possibility of contact terms at z 1 ∝ z 2 in the right-hand side. Since we only study the t-channel expansion after smearing with test functions, we might worry that the smeared commutators do not vanish because of these potential contact terms. It was argued in [25] that under natural assumptions there are no contact terms in this commutator, and we will work under this assumption. Even if there are contact terms, one can still perform the same subtractions as above to avoid them.

Fourier transform of Wightman two-point function
In this section we discuss the generalization of (5.7) to O with non-trivial spin, and compute the coefficients A(∆) and their generalizations in the case of traceless-symmetric O.
The identity (5.7) is essentially dual to the two-point function in momentum space (5.5). Thus, in order to find its generalization to O with spin, we should study the general Wightman two-point function in momentum space.
The two-point function is constrained by scale, translation, Lorentz, and special conformal invariance. Let us set special conformal invariance aside for the moment and consider the implications of the other symmetries. First of all, scale and translation invariance imply for some function F . Lorentz invariance further constrains the form of F . Suppose we have defined F αβ ( e 0 ), (5.21) where e 0 is the unit vector along the time direction. Then Lorentz invariance allows us to determine F αβ (v) for any unit-normalized timelike v, and thus also the complete two-point function. The value of (5.21) is only constrained by invariance under SO(d − 1) rotations. In other words, the allowed values of (5.21) are in one-to-one correspondence with where λ is an SO(d − 1) irrep which appears in the decomposition of ρ O . These invariants are defined, up to a constant multiple, by the following property: Π αβ λ ( e 0 ) is the SO(d − 1) invariant which has non-zero components only along the irrep λ in index β and λ * in index α. We will see explicit examples of such invariants below. Using this basis we can write for some coefficients A λ and thus [66] O Invariance under special conformal transformations now fixes the relation between coefficients A λ with different λ [66], yielding a unique solution for the momentum-space two-point function. We will determine these coefficients for traceless-symmetric O below. To proceed, we will need the dual invariants Π αβ,λ (v), defined by the completeness relation It is an easy exercise to establish the existence of Π αβ,λ (v) from basic representation-theoretic arguments. Using these invariants, we can write the general form of (5.7) as 39 Here and below we abuse the notation by writing Π(p) instead of Π(p/|p|). In other words, we assume that Π(p) = Π(p/|p|), i.e. Π is a scale-invariant function. This is consistent because we have only defined Π(v) for v 2 = −1.
The general t-channel conformal block is then given by (5.28) The simplest ingredients which enter into (5.28) are the coefficients A λ (∆, ρ) and the invariants Π αβ,λ (p). In the case when ρ O is traceless-symmetric tensor, λ is a tracelesssymmetric tensor of spin s = 0, 1, . . . J. The invariant Π s (p) has two sets of tracelesssymmetric indices, We can view Π s (p) as a linear operator on traceless-symmetric spin-J tensors, and define Π s (p) as the orthogonal projectors onto the spin-s SO(d − 1) irrep inside the spin-J tracelesssymmetric irrep of SO(d − 1, 1). Note that Π s (p) have to be proportional to these projectors; requiring them to be equal to the projectors gives a convenient normalization with which Π αβ,λ and Π αβ λ are equal. In particular, equation (5.26) follows from, in operator notation, where we have used the standard properties of projectors. It is convenient to contract the indices with null polarization vectors z 1 and z 2 to define Π J,s (p; z 1 , z 2 ) ≡ z 1,µ 1 · · · z 1,µ J Π µ 1 ...µ J ;ν 1 ...ν J s (p)z 2,ν 1 · · · z 2,ν J . (5.31) We have included an explicit J label to keep track of the Lorentz irrep when working in index-free formalism. By Lorentz invariance and the homogeneity properties of Π s (p), we must have where Π J,s (η) is a polynomial of degree at most J and .
In particular, if we set p = (1, 0, . . . , 0) and z 1 = (1, n i ), where n i are unit vectors in R D−1 , then η = (n 1 · n 2 ). Since Π J,s is projecting the spin-J SO(d) irrep onto the spin-s SO(d − 1) irrep, we should have where C (  d−3 2 ) s is a Gegenbauer polynomial. 40 We can fix the coefficients by requiring, as a linear operator, J s=0 Π J,s (p) = 1, (5.35) or in other words This leads to (5.37) We will normalize the time-ordered two-point function for spacelike separation by where With this normalization, one can compute [70] O(p, z 1 )|O(p, z 2 )

(5.40)
We reproduce the calculation in appendix C. We then read off An interesting application of (5.40) is that it proves sufficiency of the usual unitarity bounds. One can check that when these bounds are satisfied, the combination (−1) J+s Π J,s (p; z 1 , z 2 ) corresponds to a positive-definite bilinear form, (−1) J+s A s (∆, J) is positive, and that (5.40) is locally integrable. This is sufficient to show that has non-negative norm for any test function f . Let us look at some simple cases which are relevant for the examples that we discuss below. First of all, if J = 0, we can only have s = 0 and .

(5.43)
This is positive for ∆ > d−2 2 , in accord with the unitarity bound. As ∆ → d−2 2 , A 0 (∆, 0) goes to 0. Combined with the factor (−p 2 ) ∆− d 2 we find that for ∆ = d−2 2 the two-point function is proportional to δ(p 2 ), as is expected for the theory of free scalars. 41 Suppose now J = 1. We have .

(5.47)
This is consistent with the fact that spin-1 operators with ∆ = d − 1 are conserved currents, i.e. they transform in a short multiplet. The condition A 0 (∆, 1) = 0 simply says that the scalar s = 0 component vanishes, In position space this is just the conservation equation

Light transform of a general three-point function
We now turn to the calculation of the three-point functions which enter (5.28). We will first apply the light-transform and then the Fourier transform.
In this section we heavily utilize the embedding formalism [72,73]. Let us briefly review the basic features of this formalism. The space-time points in R d−1,1 are put in one-to-one correspondence with null rays in R d,2 . The conformal group SO(d, 2) acts linearly in this space. The points in R d,2 are denoted by X and null rays can be described by X subject to X 2 = 0 and identification X ∼ λX for λ > 0. If we introduce the components X ± , X µ (where µ runs over indices of R d−1,1 ) such that then x µ ∈ R d−1,1 can be embedded as Here we used X ∼ λX to set X + = 1. 42 Local operators can be described by functions O(X), defined for X 2 = 0, which are homogeneous Traceless-symmetric spin-J representations are described by adding dependence on a polarization vector Z, subject to Z 2 = X · Z = 0, and In terms of the R d−1,1 polarization vector z and coordinate x we can identify (Z + , Z − , Z µ ) = (0, 2(x · z), z µ ). (5.54) Here, we used the equivalence Z ∼ Z + αX to set Z + = 0. We will use the embedding formalism to compute the action of the light transform (3.3) on correlation functions of local operators. For this, we need its form in embedding space [20], Note that the arguments X and Z are effectively swapped in the right hand side compared to the Minkowski coordinates x and z entering in (3.3). As shown in [72], a general parity-even three-point function of traceless-symmetric primary operators can be built out of two basic objects V i,jk and H ij , defined as . i . Moreover, this is also the combination in which X i enters into those invariants above which contain Z i . This fact greatly simplifies the computation of the light transform (5.55) of three-point structures. Indeed, the definition instructs us to replace X → Z − αX, Z → −X, and integrate over α. The combination Z i is invariant under this replacement and thus factors out of the integral. For example, this implies that (5.57) where notation L i means that the light transform is applied to point i. Similarly, we can factor out all H jk from under L i . 43 Therefore, if we start with a three-point tensor structure To further constrain the form of the function f it is useful to step back and discuss some general properties of the light-transform. The light-transform in general acts on continuousspin operators and yields new continuous spin operators. Here "continuous-spin" doesn't necessarily mean J / ∈ Z ≥0 , but rather that the operator is not polynomial in its polarization vector z (or Z in embedding space notation). In this sense, the J = 0 operator φ 1 in (5.60) is special in that it is polynomial in Z 1 . 44 We will refer only to the operators which satisfy this requirement as "integer-spin." The structure (5.60) is the only three-point tensor structure that is free of (z 2 · z 3 ) and also consistent with all three operators being of integer spin. Similarly, the structure (5.61) can be singled out as the only structure which is free of (z 2 · z 3 ) and corresponds to two integer-spin operators and the light-transform of an integer-spin operator φ 1 .
The fact that φ 1 is an integer-spin operator can be expressed as where D m is the Todorov/Thomas operator [72]. 45 This should be thought of as a shortening condition for φ 1 . It is natural to expect that there exists a differential operator D L m which provides a dual shortening condition for L[φ 1 ], i.e.
It is easy to guess the quantum numbers of D L m L[φ 1 ] by assuming that they are just those of L[D m φ 1 ]. A simple exercise shows that it has ∆ = 2, J = 1 − ∆ φ , and the index m should be thought of as being in the second row of the Young diagram (the first row is accounted for by Z). This allows us to write an ansatz for D L m and fix the coefficients by requiring consistency with various embedding space constraints. We find that satisfies all the required properties, including (5.63). Here W is a polarization vector for the second-row indices, and satisfies W 2 = W · X = W · Z = 0 and W ∼ W + αZ + βX. 46 We can therefore constrain the function f by requiring that By expanding this equation into appropriate conformally-invariant tensor structures, we find the following constraints for f , In this case it simply means that it is independent of Z1. 45 This rather involved form of Dm is required to make sure it is consistent with the fact that φ is only defined for Z 2 = X 2 = Z · X = 0. Because of this, a single derivative ∂/∂Z isn't good enough. 46 We discuss/review the embedding formalism for general Young diagrams in [21].
where C J is defined as In this section, however, it is more convenient to treat the structures that are multiplied by a, b, c, h, k independently. Let us focus on the structure with coefficient a in (5.70). Using the results above we find that where the light transform is applied to the correlation function with We can now use equations (5.69) and (5.67) to write (for (3 > 2) ≈ 1), Therefore, in this case Calculation of the light transforms for other structures, corresponding to coefficients b, c, h, k, is completely analogous. The complete result is Note that the algorithm for computing the light transform is much simpler than in the case of the shadow transform [74].

Fourier transform of three-point functions
Above we have described how to compute the light transform for a general three-point structure. We now need to set x 1 = ∞ and Fourier-transform O 2 and O 3 . Since after setting x 1 = ∞ the three-point function becomes translation-invariant in x 2 and x 3 , it suffices to only Fourier-transform O 3 . Therefore, we want to compute the Fourier transforms The configuration in the integrand corresponds to Under this substitution we have Using these identities, the three-point function under the integral in (5.83) can be reduced to a linear combination of terms of the form J 2 = n 23 + n 12 + m 2 , and m 2 , m 3 , n 12 , n 23 , n 31 are non-negative integers.
In the simplest case when J 2 = J 3 = 0, there is only one structure It is straightforward to compute the Fourier transform where the iǫ prescription x 0 → x 0 + iǫ has to be used, and the coefficient F ∆,J is given by We can reuse this result for general J 2 and J 3 . Note that exactly the same calculation as above works for structures with m 2 = m 3 = 0. To obtain the result for non-zero m 2 and m 3 we can introduce the following auxiliary basis, where m 2 and m 3 are given by (5.87), m 1 = J 1 − n 12 − n 31 , and D z is the Thomas/Todorov operator [72]. Acting with, for example, (z 2 · D z 1 ) on (−x · z 1 ) α produces terms of two types. One contains (x·z 2 ), which is the desired term. If we had only this term, then (5.91) would be proportional to (5.86). However, there is a second term, proportional to (z 1 ·z 2 ). Nevertheless, it is clear that this term leads to contributions to (5.91) which have fewer powers of (x · z i ) than (5.86). This means that the relationship between the structures (5.91) and (5.86) is given by a triangular matrix, and thus can be straightforwardly inverted. In particular (5.86) and (5.91) span the same space of structures. The advantage of using (5.91) is that, obviously, which corresponds to J 2 = J 3 = 1. We have to look at five structures, For example, After computing the other structures, it is straightforward to plug (5.85) into (5.80) to find the coefficients a, b, c, h, k and then use (5.92). These intermediate steps get somewhat messy and we do not reproduce them explicitly here.
To write down the final result, it is convenient to apply a Lorentz transformation and a dilatation so that p = (1, 0). We can then choose z i = (1, n i ), where n i are unit vectors. In this frame the Fourier transform becomes where C J and a 2 define b by The other constants a, c, h, k are determined by b and C J through equations (5.71) and (5.72).
Notice that this method of computing Fourier transforms quickly gets out of hand if, say, J 3 is large, or if we want to keep it as a free parameter. The latter is important if we want to compute all the conformal blocks for a particular event shape. In section 5.4.1 we describe the calculation of Fourier transforms relevant for scalar event shapes at generic J 3 . This can be used as a seed for calculation of Fourier transforms for more complicated event shapes at generic J 3 , although we will not pursue this direction.

(5.98)
From the boundary t-channel point of view, this calculation corresponds to keeping only the "comb" t-channel k-point blocks which exchange, respectively, φ, J, or T in all intermediate channels.
We discussed the computation of such t-channel k-point blocks in section 5.1, here we would like to see how they reproduce the bulk calculations.
In this section we write all event shapes in the configuration p = (1, 0), z i = (1, n i ). In the simplest scalar case we have where we again use notation where the momentum conserving delta-functions (2π) d δ d (0) are implicitly removed. Combining these expressions together, we find that (5.101) The last equality follows from the Ward identity together with the fact that because of Lorentz invariance, φ(p)|E(n)|φ(p) is independent of n. Of course, we can also explicitly compute φ(p)|E(n)|φ(p) using the algorithm described in the previous subsection, with the same result. Clearly, given our choice of p, (5.101) is equivalent to (3.28). This straightforwardly generalizes to the event shapes in ǫ · J and ǫ · T states. When spinning operators are exchanged, according to (5.28), we need to glue the three-point functions using SO(d − 1) projectors while summing over different SO(d − 1) components. However, when we are working with J or T , the three-point functions only have a single SO(d − 1) component -of spin-1 or spin-2, respectively -because of the shortening conditions. Thus, the projectors act trivially. For example, in the case of an ǫ · J event shape, we have Notice that we only sum over spatial indices above. This is because for p = (1, 0), |J 0 (p) = 0. We thus match the bulk result (3.39) if

Structure of the general sum rule
In this section we describe the general properties of the sum rule which expresses the commutativity of shocks, Particularly, we would like to understand some natural components in which this equation can be decomposed, and how various operators in the t-channel contribute to these components. The sum rule is obtained by writing (5.113) as Therefore, the leading contribution to the sum rule is given by the single-trace operators. 47 As we have seen in the examples above and in section 3.3, there exist special, minimal couplings of single trace operators, with which they do not contribute to the sum rule. The sum rule is therefore satisfied if all single trace operators have minimal three-point functions. However, if some single trace operator has a non-minimal coupling, its contribution must be canceled by non-minimal couplings of some other operators.
In the rest of this section we will study the symmetries of the sum rule, and the constraints that these symmetries impose on the potential cancellation of non-minimal couplings.

Tensor structures
First, let us discuss the symmetries of equation (5.113). It contains the momentum p, which we are free to set to any value. We can choose p = (1, 0). After this, the only symmetry remaining is the SO(d − 1) of spatial rotations transverse to p. It is therefore convenient to decompose the spin degrees of freedom of all four operators under this subgroup.
For the integer-spin operators O 4 and O 3 the decomposition is simple and is described, for example, in [75,76]. In the simplest case of a SO(1, d − 1) traceless-symmetric tensor of spin J, upon reduction to SO(d − 1) we get traceless-symmetric tensors of spins s = 0, . . . , J. If the operator is conserved, then only s = J survives. In general, let us denote by ρ i the SO(1, d − 1) irreps of these operators, and by λ i ∈ ρ i their SO(d − 1) components. 47 Note that it does not make much sense to go to higher 1/N orders, since for example in the case O1 = O2 = T the event shapes are ill-defined beyond the planar order, c.f. section 4.5. ](∞, z 1 ) as a function of z 1 is that it is homogeneous. This homogeneity allows us to completely encode this function by its values for z 1 = (1, n 1 ), and these values are completely unconstrained. We therefore conclude that L[O 1 ](∞, z 1 ) is equivalent to a scalar function on S d−2 parametrized by n 1 . As is well-known, under the action of SO(d − 1), the space of such functions decomposes into all possible traceless-symmetric representations

Decomposition of continuous-spin operators
.

(5.116)
Such invariants can be conveniently labeled by where λ ∈ j 1 ⊗ j 2 and λ * ∈ λ 3 ⊗ λ 4 , and s stands for s-channel. This invariant is obtained by restricting to a particular term in the direct sums above, and by selecting a particular irrep in j 1 ⊗ j 2 . 48,49 The left hand side of the sum rule can then be expanded The sum rule can be written in components as Note that these are scalar equations, i.e. they contain no cross-ratios. 48 It may be the case that λ * appears in λ3 ⊗ λ4 with multiplicity. In this case, we need to add an extra label. 49 In the case O1 = O2, the sum rule is explicitly antisymmetric in n1 and n2. This is reflected in a restriction j1 ≤ j2 and a selection rule on λ for j1 = j2. Also the definition of the invariant for j1 < j2 must be altered slightly.
We would now like to understand which t-channel operators these components receive contributions from. First, note that the t-channel computes not the commutator, but the individual event shapes (note the arguments in the second event shape), The coefficients in the sum rule are given by 50 so it is sufficient to understand how t-channel operators contribute to E 12 and E 21 . The structures defined above are natural from the point of view of computing the commutator, but for the t-channel expansion the more natural structures are which are obtained similarly to {· · · } s structures, but with The usefulness of these structures comes from the fact that λ here is exactly the same as the one summed over in (5.28). This is already non-trivial, since it tells us that contributions to the sum rule from operators of bounded spin live in a finite-dimensional space. This also implies, for example, that in the sum rule, a generic contribution of a spin-6 operator cannot be canceled by a 50 There might be an additional relative coefficient between the two terms which depends on the convention for Clebsch-Gordan coefficients and normalization of the invariants. Note that in case j1 = j2 the operation of permuting j1 and j2 has a definite eigenvalue ±1 depending on λ. In the case O1 = O2 the two terms either cancel or add up, depending on the sign. This corresponds to the selection rule on λ mentioned above.
spin-0 operator. It is less obvious whether a spin-J operator can be completely canceled by spin-J operators. In principle, we can have lots of spin-J operators all contributing to the same finitely many components of the sum rule, so it might seem that there are enough free parameters to cancel out all components. However, in a unitary theory, due to the reality properties of OPE coefficients, the contributions of operators have fixed signs, and it might be that it is impossible to satisfy the sum rule by non-minimal couplings of operators of a single spin J. It might even be true that no finite set of spins is sufficient. We leave the investigation of this question to future work.

T |EE|T example
Here, let us consider two simple contributions to (5.125), from the exchange of a stress-tensor itself and from a massive scalar. We will only consider the structures to which the scalar contributes non-trivially. According to the above discussion, before taking the commutator the scalar only contributes to  1). This means that we will only get 2 equations involving scalar contributions.
In fact, we can compute that under taking the commutator is a non-negative function. The stress-tensor exchange contribution to j 1 = j 2 = 2 is given by We therefore get two sum rules in which scalars participate, corresponding to {2, 2|(3, 1)|2, 2} s and {2, 2|(1, 1)|2, 2} s , For example, in d = 4 this reduces to which we quoted in (1.6) in the introduction. Here "non-scalar" represents contributions of higher-spin operators, starting from massive spin-2. We see explicitly that there can be no cancellation between massive scalars. Furthermore, there is a component of the contribution of scalars which cannot be canceled by the stress-tensor exchange. (The reverse is obvious since the stress-tensor contributes also to components other than j 1 = j 2 = 2.) One can also take appropriate linear combinations of these equations to obtain separate sum rules for (t 4 + 2t 2 ) 2 and scalar contributions.

General t-channel blocks for scalar event shapes
In this section we derive a closed form expression for all t-channel conformal blocks appearing in where φ i are all scalars. The only essential difference from the algorithm of section 5.1 is that we perform Fourier transform in a slightly different way, and we keep the intermediate spin as a free parameter.

Fourier transform for scalar event shapes
We start with the three-point tensor structure Using results of section 5.1 we find for 1 ≈ (3 > 2) We now want to find the Fourier transform (5.83), so we have to specialize to configuration (5.84). Using (5.85) we find .
In (5.139) we have a rather non-trivial function of x, and it is not obvious whether it should have a simple Fourier transform. Let us define functions These functions are homogeneous in x and satisfy This means that for some function Q. Therefore, for the purposes of computing the Fourier transform, we can treat these functions as (z · x) m 1 +m 2 , where z is a null vector. In other words,

(5.143)
We can find the decomposition This yields the following Fourier transform (5.146) Surprisingly, this sum reassembles into another hypergeometric function, It would be interesting to understand whether one can arrive at this expression in a more direct way, which generalizes to more complicated three-point functions.
Note that one can in principle use this result as a "seed" to compute more complicated objects, such as by using weight-shifting operators [63,73]. We can always choose the weight-shifting operator acting on the point at infinity to be powers of Z 1 , which evaluates to something x-independent. Weight-shifting operators acting on points 2 and 3 can be rewritten as differential operators in momentum space, since any weight-shifting operator is polynomial in both coordinates and derivatives [63]. This suggests that an expression in terms of linear combinations of 3 F 2 functions can always be found for this type of objects.

Decomposition into SO(d − 1) components
The last step in the computation of t-channel event-shape conformal blocks is to compute the For this it is convenient to decompose the spin degrees of freedom of O α (p) in each three-point function into irreducible components under the SO(d − 1). Take the scalar structure For the moment we are considering it just as a function of z 3 . We can write The indices of the traceless-symmetric f µ 1 . . . f µ J have to be provided by p and z 1 . 51 It may appear that there are several choices of how many indices to fill with p, but in fact all possibilities are exhausted by using no p at all. The reason is that there is only one way to obtain a given SO(d − 1) irrep from a given SO(d − 1, 1), and in this case we are trying to extract spin-s irrep from L[φ 1 ]. We thus find that 51 The contribution of h µν is fixed by tracelessness condition and in any case vanishes after contracting with traceless symmetric projector.
for some numbers φ 2 |L[φ 1 ]|O (s) 3 . Another way of arriving at this conclusion is to specialize to kinematics p = (1, 0) and z i = (1, n i ). In this kinematics the three-point function (5.150) is necessarily a function of (n 1 · n 2 ), where n i live on the unit sphere. The question of decomposing into SO(d − 1) representations is then equivalent to decomposition of this function into spherical harmonics of n 2 , which are proportional to Π J 3 ,s (n 1 · n 2 ) since the latter is essentially a Gegenbauer polynomial.
To perform the decomposition explicitly, we rewrite the result (5.147) in the special kinematics, where η = (n 1 ·n 2 ). The hypergeometric function truncates to a polynomial in η thanks to the argument −J 3 , making the decomposition straightforward for each given J 3 . Interestingly, we can find a closed-form expression for the coefficients, We thus conclude that where F is given by (5.90).

Complete scalar event shape blocks
We have found that Using (5.28) we then find the scalar event shape t-channel conformal block To find the above expression we used the identity Π J,s Π J,s ′ = δ ss ′ Π J,s . As we discussed above, in order for the t-channel expansion to converge, we should smear the event shape over the polarizations of detectors It is convenient to use the following smearing functions (5.162) so that the result simply picks out the coefficient of the Gegenbauer polynomial C (  d−3 2 ) s (η) in the event shape. We can define the corresponding blocks G t ∆,J (s) by the identity, for p = (1, 0), Using (5.160) and (5.37) we find where p = (p 2 , p 3 ) and We thus find It is easy to find the covariant form Now, setting p = (1, 0) and z i = (1, n i ), we get The delta-function forces n 1 and n 2 to point in opposite directions. This corresponds simply to the fact that φ 3 = ρσ creates a pair of particles, and by momentum conservation they must fly off in opposite directions, since we have set the spatial component of p to 0. We would like to compute this event shape using the t-channel OPE. We will first expand the four-point function (5.166) in the 14 → 23 channel and then use the resulting expansion to sum the event shape conformal blocks.
The disconnected piece of (5.166) only contains the contributions of the identity and double-trace operators. The double-trace operators do not contribute to the event shape, as seen in the discussion below (5.9). The identity operator also doesn't contribute, as its contribution is since light-transforms annihilate the vacuum state [20]. The connected contribution can be rewritten as where φ ′ 1 and φ ′ 2 are fictitious canonically normalized free scalars of dimension ∆/2 = 1. The prefactor 1 simply plays the role of shifting the external dimensions of conformal blocks, and so we have the identity where in the right hand side we mean the OPE coefficients which enter into the decomposition of the function φ ′ 2 ( . Since this is a four-point function of free fields, only a single family of higher-spin currents O with ∆ = J + 2 contribute to its OPE. In our conventions we have [77] The reason only s = J is allowed is because ∆ = J + 2 corresponds to conserved higher-spin currents, which only have one SO(d − 1) component. Using (5.163) we find The last equality follows from the completeness relation for Legendre polynomials and P J (−1) = (−1) J . This result indeed agrees with (5.176). Note that the convergence here is only in a distributional sense, i.e. we have to smear the event shape with some test function in η before computing the sum. A more nontrivial check is to repeat the same calculation in a generalized free theory (GFT). The event shape is the same up to a coefficient, where ∆ f is the dimension of the fundamental fields ρ and σ (∆ f = 1 for the free scalar case considered above). We can use the same logic to obtain the relevant OPE coefficients from the known GFT ones [30,78]. The main difference is that each Legendre polynomial P s (η) receives contribution from infinitely many operators [ρσ] n,J with J ≥ s and n ≥ 0. 52 The sum is now much more non-trivial and we cannot tackle it analytically. We have focused on the coefficients in front of P 0 (η) and P 1 (η), and found numerically that the sum over n converges rather quickly. However, the sum over J appears to behave as and so it diverges for ∆ f > 3/2 and converges for ∆ f < 3/2. In the latter case, convergence can be improved by an Euler transform, 53 which allows us to check for a few sample values of ∆ f with 1 < ∆ f < 3/2 that the t-channel sum agrees to high precision with which is the expansion of (5.183). 54 Based on intuition from ν-space described in [21], we expect that the divergence for ∆ f ≥ 3/2 is due to the behavior of our test functions near the collinear limit η = 1. Recall that if To moderate the contribution of η = 1, we can smear with linear combinations of functions f s which vanish at η = 1 as (η − 1) k for sufficiently high k. Suppose that s α s f s (η) is one such combination for a fixed k. We are then led to consider the expansion Numerically, we find that the sum over primary operators with different spin behaves as where w k is monotonically non-increasing function of k. Specifically, for any choice of ∆ f , we find that w k starts at w 0 = 2∆ f − 3 as described above, and then decreases monotonically with k until it saturates at some value w * < 0. 55 This means that for any value of ∆ f the expansion converges for test functions which vanish sufficiently quickly at η = 1. 53 For series ∞ n=0 (−1) n an the Euler transform is where bn = n k=0 (−1) k n k a k . It generally tends to improve the rate of convergence of slowly-converging series. 54 In fact, Euler transform makes the sum over J convergent for all values of ∆ f . 55 Typically w k ≈ w k−1 −2, but we have not studied this in sufficient detail either numerically or analytically.

Bounds on heavy contributions to non-minimal couplings
A quantitative understanding of the superconvergence sum rule requires some extra analysis which we postpone for future work. Here, we sketch its qualitative implications. For simplicity, we consider a toy model for a gravitational scattering amplitude, but the argument for the CFT correlator is essentially the same. In our discussion, the rough correspondence between amplitudes and CFT correlators is The basic idea was explained in [19] and goes as follows. Let us imagine a theory with a large gap ∆ gap ≫ 1 in the spectrum of particles (or operators). We would like to bound the contribution of heavy, or stringy, modes to the superconvergence sum rule and therefore determine the maximal allowed value of non-minimal three-point graviton couplings. To do so, recall that given a polynomially bounded "amplitude" A(s, t) with "partial wave" expansion we can write a "Froissart-Gribov inversion formula" which takes the following form where Disc u = −Disc t . Convergence properties of the integral (6.2) depend on the behavior of the amplitude at large t and fixed s. In a consistent theory, the integral converges for J > 1.
In particular, we can evaluate (6.2) at J = 2, for which the integral (6.2) must reproduce the graviton pole at s = 0, More generally, away from the graviton pole, the integral over the discontinuity should correctly reproduce the Pomeron pole where we expect a J (s) is suppressed by 1/c T . This suppression does not follow from our knowledge of the three-point coefficients that control the residue of the pole. Indeed the value of the residue in (6.4) solely reflects the asymptotic behavior of the discontinuity in (6.2), Disc t A(s, ν) ∼ C 2 φφP (s)ν J(s) at large ν. In particular we can imagine an isolated "outlier state" at some intermediate scale ν * that contributes to Disc t A(s, ν) ∼ C 2 φφP * (s)δ(ν − ν * ) with a large coefficient C 2 φφP * (s). Through (6.2), such an outlier state would invalidate the estimate a + J (s) ∼ 1 c T away from the pole. In what follows we assume that there are no outliers. The same assumption was made in [19].
In a large-gap theory we expect J(s) = 2 + s Correctly reproducing the 1 c T residue of the graviton pole (6.3) from (6.4) thus requires that C 2 φφP (s) ∼ 1 . Note also that by plugging Disc t A(s, ν) ∼ C 2 φφP (s)ν J(s) for ν > ∆ 2 gap into (6.2) (and thus assuming no outliers) we get The superconvergence sum rule is the statement that a − J=3 (s) = 0. Note that only the square of the non-minimal three-point coupling contributes to a − J=3 (s), see e.g. (2.60). We can write it as follows where we separated the contribution from the graviton pole α GB (s) from the rest ("GB" stands for Gauss-Bonnet -a particular type of nonminimal coupling that contributes to α GB ). Next we would like to bound the contribution of heavy states in (6.6). The estimate goes as follows where we used the positivity of Disc t A(s, ν) and Disc u A(s, ν), which follow from unitarity in an appropriate kinematical region. We see that what follows is the same qualitative conclusion as was obtained in [3]. This time, however, we have a precise sum rule that must be satisfied. We will show in [21] that the superconvergence sum rule in CFT can be written as where C ± (∆, J) is the quantity computed by the Lorentzian inversion formula, so the argument in the CFT case proceeds analogously to the one here.

Conclusions and future directions
In this work, we found connections between commutativity of coincident shocks, superconvergence sum rules, and boundedness in the Regge and lightcone limits. These connections hold both in flat space and in AdS.
In flat space, we defined "shock amplitudes" as amplitudes with special external wavefunctions. We showed that boundedness of amplitudes in the Regge limit is a sufficient condition for commutativity of coincident shocks. Furthermore, when coincident shocks commute, one obtains superconvergence sum rules that constrain the matter content and three-point couplings of the theory. It was argued in [3] that causal theories of gravity should have Regge intercept J 0 ≤ 2. Assuming this, it follows that coincident gravitational shocks commute. The associated superconvergence sum rules relate non-minimal gravitational couplings to three-point couplings of stringy states.
In AdS, commutativity of coincident shocks is dual to the question of commutativity of certain null-integrated operators (e.g. ANEC operators) in a CFT. This question can be studied on its own using CFT techniques. In particular, we show using CFT methods that ANEC operators on the same null plane commute. (This result holds both nonperturbatively and in the planar limit, but it can be violated at fixed loop order in bulk perturbation theory.) This establishes commutativity of coincident gravitational shocks in AdS. We conjecture that coincident gravitational shocks commute in UV-complete gravitational theories in flat space, AdS, dS, and possibly beyond, dubbing this a "stringy equivalence principle." The CFT version of superconvergence sum rules can be obtained by inserting complete sets of states between the null-integrated operators. In large-N theories, such sum rules relate non-minimal bulk couplings to the massive single-trace spectrum. However, the resulting sum rules are completely general, independent from holography, and interesting on their own.
Let us discuss some open questions and future directions.

Constraints on UV-complete gravitational theories
Higher derivative gravitational couplings are inconsistent with commutativity of coincident shocks, unless their effects are cancelled in the superconvergence sum rule. This cancellation can occur in different ways. In weakly-coupled (tree-level) gravity theories, the cancellation must involve massive (stringy) states. More generally, the cancellation could involve loop effects. An important problem is to compute independent bounds on non-graviton contributions to the superconvergence sum rules. This would give quantitative bounds on the size of non-minimal gravitational couplings. A toy version of this argument was presented in the previous section. However, it will be important to make it precise. Another interesting question is the extent to which the low-energy matter content of the Standard Model is consistent with commutativity of coincident gravitational shocks. If one could compute the Standard Model contribution to superconvergence sum rules (including loop effects) and find that it is nonzero, that would establish the necessity of additional massive states, and possibly hint at their properties.
To incorporate loop effects (e.g. loops in the Standard Model or 1/N effects in CFT), it may be necessary to use eikonal techniques to re-sum gravitational exchanges. The reason is that n-graviton exchange on its own leads to Regge growth with spin J 0 = 1 + n, which would invalidate superconvergence sum rules.

Bootstrapping amplitudes and four-point functions
In the context of flat-space amplitudes, superconvergence sum rules have been used to bootstrap the Veneziano amplitude [36]. This result relies on assuming linear Regge trajectories.
The assumption of linear trajectories has two nice effects. Firstly, it removes the necessity of bootstrapping masses -one can focus only on three-point couplings. Secondly, because Regge trajectories behave as J(s) = const. + α ′ s, one can make J(s) arbitrarily negative by making s arbitrarily negative. When J(s) ≤ −k, for integer k, one obtains a new superconvergence sum rule obtained by inserting t k into a dispersion relation.
If so, one should be able to obtain additional superconvergence sum rules beyond the ones we have discussed at sufficiently large ν 2 . We expect such sum rules should come from commutativity of the operators L n ( y) defined in [24] and other descendants of L[T ](x, z), such as those studied in [25]. Note that these operators only need to commute when transformed to ν space and placed at sufficiently large ν. To transform to ν-space, one must smear the operators in the transverse positions y against an appropriate d−2-dimensional three-point structure, see e.g. [21].
It would be interesting to understand how our flat space analysis is embedded into the corresponding limit of the AdS/CFT duality. Indeed, in theories with sub-AdS locality there is no problem in localizing both the shocks and the probes in the region of spacetime much less than L AdS [82]. Therefore the flat space analysis should apply in this limit. Shockwaves that are well-localized in the AdS interior were analyzed for example in [52]. The same idea should apply to high energy scattering in nontrivial backgrounds, e.g. in the vicinity of a 56 We thank David Meltzer for discussions on this point. black hole horizon [83]. Presumably, the commutativity of shocks in this case is related to the consistency conditions on the spinning scramblon couplings [84]. More generally, the statement that gravitational shocks commute locally at every point in AdS should constrain spinning couplings to the "modulon" [85], a mode that saturates the modular bound on chaos and captures local high energy gravitational scattering in the bulk. It seems plausible that saturation of the modular chaos bound together with the corresponding superconvergence conditions uniquely select Einstein gravity in AdS as the dual theory. 57

Generalizations
Although we have focused mostly on superconvergence sum rules coming from commutativity of ANEC operators, one additionally gets sum rules from studying [L[O 1 ], L[O 2 ]] for any pair of operators O 1 , O 2 with sufficiently large J 1 + J 2 . A further generalization could come from studying commutativity of more general continuous-spin light-ray operators defined in [20]. Can one show that such general light-ray operators commute on the same null plane as well, [O 1 , O 2 ] = 0? One way to obtain such light-ray operators is from OPEs of more traditional null integrals, [21]. In this case, commutativity of O k would follow from commutativity of null-integrated operators. However, the general construction of continuousspin light-ray operators is more complicated.
In addition to introducing continuous-spin light-ray operators in the CFT context, it is interesting to ask whether they can be introduced in the amplitudes context as well. In CFT, null integrated operators L[O i,J ] can be analytically continued in spin to obtain more general light-ray operators O ± i,J , where i labels Regge trajectories. It is natural to guess that the shock amplitudes defined in section 2.1 can be analytically continued in the spin of the shocks. This would provide a vast generalization of the amplitudes usually considered. It would be interesting to investigate such analytically continued amplitudes in string theory.

Further applications
In the main text we discussed a set of extra consistency conditions (superconvergence relations) on the CFT spectrum which follow from commutativity of average null energy operators. Together with ANEC, commutativity also implies that products of multiple average null energy operators are positive-semidefinite. Therefore, we can further require that Ψ|L[T ] · · · L[T ]|Ψ ≥ 0. (6.10) Using the results of section 5, this leads to extra conditions on the OPE data of the theory. In numerical bootstrap calculations, boundedness of the Regge limit is manifest, and therefore superconvergence relations are automatically true. On the other hand, from the point of view of four-point functions, the positivity conditions (6.10) are extra nontrivial conditions. For example, positivity of two-point energy correlators follows from unitarity of six-point functions, and thus is not manifest from the conformal block expansion of a four-point function. 57 We thank Tom Faulkner for discussions on this point.
It would be interesting to include these positivity constraints in the numerical bootstrap. In particular, it will be interesting to see how they affect the stress-tensor bootstrap results [86]. It will be interesting to apply the t-channel OPE formulas to QCD-like theories, say N = 4 SYM. An appealing feature of the t-channel OPE is that in the planar limit the contribution of double-trace operators is suppressed by an extra power of 1 N 2 . Therefore, to leading order only the single trace data is needed. We hope that one day these will be computed at finite λ using integrability techniques, see e.g. [80,87,88]. It will be also interesting to use the t-channel OPE to reproduce the known weak coupling results [57,89], and try to extend the t-channel analysis to actual QCD, where the state of the art is the NLO analytic result [90]. Note that in the case of N = 4 SYM, due to superconformal symmetry energy-energy correlation can be computed using the four-point function of scalars to which (5.164) can be directly applied.
Let us go back to the non-commutativity introduced by the Gauss-Bonnet coupling α 2 . Adding a spin four particle withα J 2 =α J 4 = 0 leads to ∆Q (1) 4 (η) ∼ −(α 4 0 ) 2 , ∆Q where we omitted irrelevant positive-definite D-dependent coefficients. We see that by adding to the Gauss-Bonnet theory a single spin four particle and a non-minimally coupled scalar we can satisfy the superconvergence relations in flat space. With one spin four particle in the spectrum, the theory would still have pathological Regge behavior which should become visible in a slightly different kinematics, see e.g. [3].

B Noncommutativity of light-transformed scalars
Let us consider a simple model that illustrates some of the subtleties involved in computations of light transforms at coincident points. Imagine four free complex scalar fields of different masses in AdS. The dual theory is a version of generalized free field theory with scalar operators φ k with dimensions ∆ k . We consider the following correlator 58 To do the light transforms, it is convenient to specialize to simple kinematics in the lightcone coordinates (u, v, y) for which we get where we explicitly wrote the iǫ prescription dictated by the ordering of operators and introduced a small separation in the u-direction between the detector operators. Let us first analyze the integral (4.3) which guarantees both the existence and commutativity of the coincident limit. To do that we set δu = 0 in (B.5) and perform the integral over v 1 and v 2 This integral is zero for ∆ a , ∆ d > 1 and diverges otherwise. Let us check that it agrees with the sufficient conditions for the existence of the integral derived in the main text. For the local operators at hand we have J 1 = J 2 = 0 and ∆ 1 = ∆ a + ∆ b and ∆ 2 = ∆ b + ∆ d . The Euclidean OPE, light-cone OPE, and the Regge limit are all controlled by the leading operator that appears in the OPE of O 1 and O 2 , which has dimension ∆ a + ∆ d and spin 0. The strongest constraint comes from the light-cone OPE (4.11) which for the case at hand becomes We see that the result is zero for ∆ a + ∆ d > 2, finite and non-trivial for ∆ a + ∆ d = 2, and divergent for ∆ a + ∆ d < 2. Thus, we observe that the coincident limit exists beyond the range found by the sufficient conditions described in the text. Apart from variations of the correlator above where subtleties related to the coincident limit of the light-ray operators can be demonstrated, one can also consider exchange Witten diagrams in AdS where similar subtleties occur. This can be easily seen using the Mellin space approach used to efficiently compute light transforms in [15].