Causality Constraints on Corrections to the Graviton Three-Point Coupling

We consider higher derivative corrections to the graviton three-point coupling within a weakly coupled theory of gravity. Lorentz invariance allows further structures beyond the one present in the Einstein theory. We argue that these are constrained by causality. We devise a thought experiment involving a high energy scattering process which leads to causality violation if the graviton three-point vertex contains the additional structures. This violation cannot be fixed by adding conventional particles with spins $J \leq 2$. But, it can be fixed by adding an infinite tower of extra massive particles with higher spins, $J>2$. In AdS theories this implies a constraint on the conformal anomaly coefficients $\left|{a - c \over c} \right| \lesssim {1 \over \Delta_{gap}^2}$ in terms of $\Delta_{gap}$, the dimension of the lightest single particle operator with spin $J>2$. For inflation, or de Sitter-like solutions, it indicates the existence of massive higher spin particles if the gravity wave non-gaussianity deviates significantly from the one computed in the Einstein theory.


Introduction/Motivation
In this paper we consider weakly coupled gravity theories in the tree-level approximation. It is well-known that at long distances such theories should reduce to the Einstein gravity theory. However, at intermediate energies we can have higher derivative corrections. By intermediate energies we mean those that are low enough that the theory is still weakly coupled but high enough that we are sensitive to possible higher derivative correc- tions. An example of such a theory is weakly coupled string theory where the corrections appear at a length scale √ α ′ , which is much larger than the Planck length. The theory at energies comparable to 1/ √ α ′ is still weakly coupled. In this case, the higher derivative corrections are accompanied by extra massive higher spin particles which appear at the same scale. For higher energies the description is via a string theory which departs significantly from ordinary local quantum field theory. It is reasonable to expect that this is a generic feature. Namely, that higher derivative corrections only arise due to the presence of extra states with masses comparable to the scales where the higher derivative corrections become important.
The objective of this paper is to sharpen this link for the simplest possible correction, that of the graviton three-point coupling. Due to the fact that the graviton has spin, the flat space on-shell three-point function is not uniquely specified. In general, it has three different possible structures. The most familiar is the one we get in the Einstein theory.
The others can be viewed as arising from higher derivative terms in the gravitational action. The first new structure has two more derivatives. Relative to the size of the Einstein-Hilbert term, it scales like αp 2 , where α is a new quantity with dimensions of length squared which characterizes the relative importance of the new term. Here p is the typical scale of the momenta, it is not a Mandelstam invariant, since they all vanish for on-shell three-point functions. We work in a regime where both of the three-point vertices are small, so that gravity is weakly coupled.
We find that new three-point vertices lead to a potential causality violation unless we get contributions from extra particles. This causality violation is occurring when the theory is still weakly coupled. It occurs in a high-energy, fixed impact parameter, scattering process at a center of mass (energy) 2 , s, which is large compared to 1/α but still small enough for the coupling to be weak. In general relativity this scattering process leads to the well-known Shapiro time delay [1], which is one of the classical tests of general relativity [2].
See also [3]. When the graviton three-point vertex is corrected, the new terms can lead to a time advance, depending on the spin of the scattered graviton. At short enough impact parameter this time advance can overwhelm Shapiro's time delay and lead to a causality problem. This troublesome feature arises at an impact parameter of order b 2 ∼ α. At tree-level, this problem can only be fixed by introducing an infinite number of new massive particles with spin 1 J > 2 and m 2 comparable to α −1 . In other words, it cannot be fixed by adding particles with spins J ≤ 2, or by considering the existence of extra dimensions.
These causality constraints are similar in spirit to those considered in [4,5] but they differ in two ways. First of all, here the problem will be shown to arise for small t/s, but large s. Second, the fact that the graviton has spin is crucial. On the other hand, in both cases we have locally Lorentz-invariant Lagrangians that nevertheless can lead to causality violations in non-trivial backgrounds.
An example of a theory that is constrained by these considerations is given by the action where the second term is the Lanczos-Gauss-Bonnet term. 2 The constant α has dimensions of length squared. For α ≫ l 2 p , we will show that the theory is not causal. Furthermore, there is no way to make it causal by adding local higher curvature terms. In fact, our discussion refers to on-shell data, namely the three-point function, which reflects the real physical information and does not depend on the particular way that we write the Lagrangian. In other words, the discussion is invariant under field redefinitions.
Note that if we view gravity as a low-energy effective theory with a UV cutoff of order M pl and we add higher derivative terms with dimensionless coefficients which are of order one, then we have nothing to say. The remarks in this paper only apply for theories where the coefficients of the higher derivative terms are much larger. The "natural" value for the coefficient α if we view (1.1) as an effective gravity theory is α ∼ l 2 p . We will only constrain larger values of α. Of course, we are discussing this problem because it is indeed possible to have theories with α ≫ l 2 p , for example a weakly coupled string theory. 1 In D > 4 dimensions by spin J > 2 particles we mean particles in the representations of little group SO(D − 1) with both of the following two properties (see also appendix H): a) their maximal spin projection J +− ≥ 2; b) their representations are labeled by Young tableaux with three or more boxes. 2 It is the dimensional continuation of the four-dimensional Euler density. In four dimensions it is a topological term, while in higher dimensions it is not topological, in fact it contributes to the three-point coupling of the graviton.
Another example where this discussion is relevant is the following. Imagine that we consider a large N gauge theory. Such a gauge theory is expected to have a weakly coupled string dual. This theory will have a weakly coupled graviton corresponding to the stress tensor operator [6,7,8]. However, we are not guaranteed that the dual will be an ordinary Einstein gravity theory. It might even be a Vasiliev-like gravity theory [9,10].
The field theory three-point functions of the stress tensor [11,12] determine the threepoint functions of the gravity theory [13,14]. One can imagine a theory where the only light single trace operator is the stress tensor. It is natural to expect that this theory will have an Einstein gravity dual. It would be nice to prove that. For scalar interactions, [15] argued that the solutions of the crossing equation have solutions that correspond to local vertices in the bulk, see also [16]. However, one can worry that the size of the vertices with higher derivatives might be comparable to the one in the Einstein theory. As a case in point, we can think about the graviton three-point coupling. This is constrained to be a linear combination of three structures, only one of which is the Einstein one. These extra structures necessarily lead to a causality problem unless we introduce new higher spin particles at the scale that appears in these new three-point functions. Thus we link the three-point function of the stress tensor to the operator spectrum of the theory. These three-point functions were constrained by causality in [14]. Here we get stronger constraints because we are making further assumptions about the operator spectrum of the theory.
As another application we consider the possibility that gravity waves during inflation were generated by a theory that indeed had these higher derivative corrections with a size comparable to the Hubble scale. This is a possibility which is allowed by conformal invariance and would be realized if the dual description to inflation (in the spirit of dS/CFT [17,18,19]) was a weakly coupled theory or if inflation occurs in a string theory where the string scale is close to the Hubble scale. The theory is still weakly coupled, so that scalar and tensor fluctuations are small. In this case the gravity wave non-gaussianities would be different from the ones in the Einstein theory [20,21]. The observation of such gravity wave signals, combined with the arguments in this paper, would imply the existence of extra particles with spin J > 2 during inflation.
Finally, as a further motivation we should mention the grand dream of deriving the most general weakly coupled consistent theory of gravity. It is quite likely that the only such theory is a string-like theory, broadly defined. We are certainly very far away from this dream, but hopefully our simple observation about three-point functions could be useful.
In particular, this observation highlights the importance of spin. Spin is likely to grow in importance as we consider constraints on the four-point function, given what we got for the much simpler three-point function. In [22,23,24] various interesting constraints were derived by using crossing symmetry and the correct factorization on the pole singularities.
The constraints discussed in this paper are additional constraints, not covered by their analysis.
This paper is organized as follows. In section two we discuss the notion of asymptotic causality, both for asymptotically flat and asymptotically AdS spacetimes. We also discuss the propagation of particles and fields through a shock wave. The purpose of this discussion is to set the stage for the more general argument in the following section so that it becomes intuitively clearer. In section three, we present the main thought experiment which involves only on-shell amplitudes and does not refer explicitly to shock waves. In section four we discuss the effects of adding extra massive particles. We show that particles with spins two or less cannot solve the problem. We also discuss the appearance of these massive particles among the final states of the scattering process. In order to argue that the problem persists we present an alternative presentation of the problem where we consider the analyticity properties of the S matrix in impact parameter representation.
In section five we discuss various aspects of the AdS version of the thought experiment and its implications for properties of the dual theory. In section 6 we briefly mention the implications of a possible time advance in the context of wormholes. In section 7 we discuss a cosmological application, where we link the possible existence of new structures for the gravity wave non-gaussianity to the presence of higher spin massive particles during inflation.

Flat Space Causality and Shock Waves
In this section we consider the problem in asymptotically flat space. We start by discussing the causal structure in asymptotically flat space. As a motivation for our later discussion we analyze the scattering of a probe graviton from the shock wave. We then present the argument for causality violation in purely on-shell terms.

Statement of Flat Space Causality
When we consider a gravity theory in asymptotically flat space we expect to be able to define scattering amplitudes. In particular, this presupposes that one can fix the asymptotic structure of the spacetime so that we can compare times between the past and the future asymptotic regions. In dimensions D > 4 we expect to be able to do this. Let us give a simple argument. Imagine we have a series of observers that sit at a large distance L from each other, and also from the center. For simplicity imagine them at the vertices of a large spatial hypercube and moving along the time direction. We would like to argue that they can synchronize their clocks. If they were in flat space, they can just send signals to each other. On the other hand if there is an object of mass m in the interior, then the metric components decay as 1/r D−3 . This gives rise to a redshift of order Gm L D−3 at the position of the detectors. More importantly, as a signal travels from one detector to the other, staying at a distance of order L from the center, it will get delayed by an amount δt = δL ∼ Gm L D−4 (see appendix A). In D ≥ 5, we can make this as small as we want by moving out to large enough L. For D = 4 this delay does not go to zero and we have a problem. This problem has been discussed in [25], and it seems closely related to the soft graviton issues that arise in perturbative attempts to define the gravitational S-matrix.
This issue is not present in AdS 4 . In order to deal with the D = 4 case, Gao and Wald [26] have introduced another notion of causality saying that we cannot send signals faster than what is allowed by the asymptotic causal structure of the spacetime. In general relativity, with the null energy condition, they argued that this notion of causality is respected, see [26] for a precise statement of the theorem. This holds in any number of dimensions. For D > 4 this becomes the more naive notion of causality introduced above. Note that this is an asymptotic notion and we are not relying on the locally defined lightcone as in [27].
We expect that this notion of causality is actually a requirement for any theory of quantum gravity. For asymptotically AdS theories we expect that we should not be able to send signals through the bulk faster than through the boundary. For theories of gravity that are dual to a quantum field theory in the boundary, this is implied by causality in the boundary theory. Also, since we expect that quantum gravity in asymptotically flat space is Lorentz invariant, then these time delays can lead to acausality or closed time-like curves (see appendix G).
Another reason to require this notion is to ensure that Lorentzian wormholes, such as the one obtained form the eternal Schwarzschild black hole and discussed in [28], do not lead to causality violations in the ambient spacetime. This is required by the ER=EPR interpretation of such geometries [29,30,31].
In the rest of the paper we will assume this notion of causality and derive constraints on some higher derivative corrections.

Scattering Through a Plane Wave in General Relativity
A well-known general relativistic effect is that light going near a massive body (e.g. the Sun) would suffer a time delay relative to the same propagation in flat space. This is known as the Shapiro time delay [1] and constitutes one of the classical tests of general relativity (see appendix A).
We will recall here the derivation of this time delay in the shock wave approximation, which will be all we need for our purposes. First we review the basic properties of shock wave solutions in gravitational theories [32]. This solution describes the gravitational field of an ultrarelativistic particle in a generic theory of gravity [33,34] and is directly relevant for the high-energy scattering. Fig. 1: A particle creates a shock wave localized at u = 0. A second probe particle propagates on the geometry and experiences a time delay, ∆v. The two particles are separated along the transverse directions, which are suppressed in this diagram.
A generic shock wave solution in flat space can be written in the following form (2.1) This geometry admits a covariantly constant null Killing vector l µ ∂ µ = ∂ v . See fig. 1.
We will be interested in the case when this geometry is sourced by a particle that moves very fast in the v direction. Classically, we can model such particle via the stress tensor [33] x 2 i and P u < 0 is the momentum of a particle. The Einstein equations then take the form The solution is Now we consider a probe particle that moves in the other light cone direction with momentum p v and is such that it crosses the shock with impact parameter b. In other words, the displacement in the transverse dimensions is r = b. The metric (2.1) is a bit peculiar because of the delta function δ(u) in h(u, x i ). We can remove this delta function at the transverse position b by defining a new coordinate This then cancels the δ(u) term in the metric at r = b. Thus, the geodesic that goes through this point is continuous in the v new coordinates. This means that in the original coordinates it suffers a shift This represents the Shapiro time delay, see fig. 1. Left: we consider a particle propagating through the superposition of two left moving shock waves localized at u = 0. The particle trajectory is given by the arrows. Right: in the transverse plane we separate shock waves by distance 2b and send the particle between them so that the net deflection angle is zero. The time delay is the sum of the time delays due to each shock wave.
In addition to this time delay, the trajectory is also subject to a deflection angle. We might worry that the deflection angle would hamper our ability to see a possible time delay or time advance from far away. In fact, we could consider two shocks in succession, but separated in the transverse direction. Considering the probe particle coming at r = 0, we set the shocks at r = b opposite to each other as shown in fig. 2. In this case, the probe particle does not get a net deflection, but the time delays add.
It is instructive to reproduce this formula for the time delay when we treat the probe as a quantum mechanical particle. The wave equation for a scalar field takes the form Let us now consider the change in the value of φ from u = 0 − to u = 0 + . Since the variation of the h term is much faster than the variation in the other variables, we neglect the ∂ i derivatives and write where ∆v is given in (2.6). Thus we see that we reproduce the answer we got through the geodesic analysis. Note that p v = −i∂ v is the generator of translations in v.
In quantum field theory we would end the discussion here. Any time advance in v, relative to the background Minkowski metric would be a problem. In gravity, the situation is more subtle, in principle, we need to make observations from asymptotically far away.
However, in that case, the p u energy also has a p v dependence 3 and it contributes positively to the time delay. Since p u = q 2 /4p v , this is a small effect for large p v . However, if we multiply by the total u−time elapsed, it can add to a very big time delay. Here we will assume that we can sit far enough from the shock to be able to neglect the dynamics of spacetime, but close enough that we can neglect the v−time delay produced by the p u energy. This seems possible for small G.

Connection with the Scattering Amplitude Computation
Let us reproduce the computation above using scattering amplitudes [3]. It is wellknown that the shock wave computation can be reproduced using the so-called eikonal approximation [35]. Consider the scattering amplitude for gravitating scalar particles. It is given by The eikonal approximation resums a particular set of diagrams (horizontal ladders) in the deflectionless limit when t s → 0. Under favorable circumstances, 4 the amplitude exponentiates in the impact parameter space (see e.g. [36,37] ) where the phase is given by This result matches the shock wave computation where we used s = 4P u p v . As we will see below this picture can be naturally generalized to the scattering of particles with spin.

The Effect of Higher Derivative Interactions on Particles with Spin
If we had a photon propagating through the plane wave, it will have the same time delay we computed in (2.6). However, if the Lagrangian contains certain higher order interactions, such as then the second term gives rise to a time delay that depends on the polarization of the electromagnetic wave. The equations of motion take the form In the shock wave background we have where we made use of the Bianchi identity in that limit, ∂ u F vi = ∂ v F vi ; and the total time delay then has the form ∆v = 1 +α 2 where ǫ is the (real) transverse linear polarization direction of the electromagnetic wave.
We also introduced n ≡ b b . The derivatives in (2.16) come from the derivatives present in the Riemann tensor. The second term in (2.13) can be viewed as a spin dependent gravitational force.
We see that as b becomes small, b 2 <α 2 , the second term in (2.16) can overwhelm the first and, depending on the sign ofα 2 and the polarization, may lead to time advance instead of time delay. If the polarization is along b, it has one sign, while if the polarization vector is orthogonal to b, ǫ. n = 0, then it has the other sign. If the polarization is a linear combination of these two, we first decompose the wave in linear polarizations along and perpendicular to b and then exponentiate (2.16) for each of these two cases separately, and then add the two results. In other words, the expression for the time delay is now a matrix that can be diagonalized by choosing the polarizations to be along b or perpendicular to b. In conclusion, for either choice of the sign ofα 2 , there is a choice of polarization that can lead to time advance.
Thus, if we require the theory to be causal, we see thatα 2 should be set to zero. More precisely, it should be small enough so that the computation we did above breaks down for some reason. An example of a theory where the coupling in (2.13) arises at tree level is bosonic string theory [38,39]. We will see later that in string theory the potential causality problem is fixed by the presence of extra massive states.
As another example, let us consider the Gauss-Bonnet theory. This consists of the usual gravity action plus a specific R 2 interaction of the form The term in brackets is a topological invariant in D = 4, but it is not topological for D > 4. This theory has been extensively studied because it has the nice feature that the equations for small fluctuations around any background are second order [40].
As explained in [34] the shock wave solution (2.1) is also an exact solution in the Gauss-Bonnet theory as well. We can consider propagation of a gravitational perturbation through the shock wave background. Before and after the shock the graviton moves as in flat space. All we need to know is what happens when it crosses the shock.
We consider a high-energy graviton δh ij that propagates in the v direction with momentum p v and traceless polarization in the transverse plane. Near the shock we approximate the equations as Using (2.18) we can find the time delay which takes the following form Again, by choosing different polarizations we can get time advance for b 2 ∼ |λ GB | for any sign of λ GB . Notice that the formula for the time delay is very similar to the ones discussed in the context of energy correlators in AdS/CFT with the parameters depending on the impact parameter of scattering b (see e.g. [41]). We will see below that in the case of AdS the causality constraint interpolates between the usual energy correlator bounds and the flat space bounds obtained above.
So we see that imposing positivity in all channels exclude λ GB completely unless new physics kicks in. In other words, the purely Gauss-Bonnet theory (2.17) is acausal. As an aside, the Gauss-Bonnet theory was found in [42] to violate the second law of black hole thermodynamics in certain processes. 5 We could also consider fermions coupled to gravity. At the level of the on-shell scattering amplitudes there is an additional spin-flipping structure that leads to time advance.
This additional structure does not come from any known local Lagrangian and is known to be ruled out by considering consistency conditions imposed on the four-point amplitude [24], at least, in four dimensions. If the same is true in all dimensions then the effects that we are discussing could not be observed for fermions. We leave exploration of this point for the future. Note that having an explicit local Lagrangian which leads to second order equations of motions guarantees that all consistency conditions for the four-point amplitude imposed in [24] are obeyed. Thus, none of the problems discussed in [24] arise in the case of photons or gravitons with the modified three-point functions that we are discussing.

General Constrains on the On-Shell Three-Point Functions
The examples we have discussed so far have shown that there are causality issues with specific theories. In this section we will isolate the crucial elements that produce the problem. It turns out that these causality problems are produced purely and exclusively by the form of the on-shell three-point functions of the theory. Therefore they are insensitive to 5 More precisely, [42] considered the compactification of the Gauss-Bonnet theory to four dimensions on a circle. Then the Gauss-Bonnet term becomes topological. This leads to a constant contribution to the black hole entropy. The sign of this constant contribution depends on the coefficient λ. When this contribution is negative, a small enough black hole can have negative entropy. When the constant is positive, a merger of two black holes can violate the second law.
The constraints arise when the Schwarzschild radius r 2 s ∼ λ GB . In this sense they are similar to the ones we have. In both cases one needs λ GB ≪ l P lanck in order to have a meaningful statement. [42] has the nice feature of also constraining purely topological terms in four dimensions. On the other hand, they might be modified by higher derivative corrections to the action, which could also modify black hole thermodynamics.
higher order contact terms. First we will explain why the three-point functions determine the time delay. Then we will recall some of the properties of three-point functions. Finally, we will present a thought experiment that makes the causality violation more manifest.

The Phase Shift in Impact Parameter Representation From Three-Point Functions
Let us consider the tree-level four-point amplitude A 4 . It depends on the kinematic invariants that we can produce with the four on-shell conserved momenta and polarization tensors of the external particles. Since it is a tree-level amplitude its only singularities are poles in the s, t and u Mandelstam variables. We can now consider external momenta such that s ≫ t, but s small enough that the theory is still weakly coupled. We can take the first incoming particle to have very large momentum along p u and the second with large momentum p v . Fig. 3: Kinematics of the two-to-two scattering that we are interested in. Particle 1 has a very large momentum p u and particle two has a very large momentum p v .
One can show that, in impact parameter space the amplitude is given by (for review see appendix C) where A 4 ( q) is really a short-hand for the four-point amplitude evaluated in the following momentum configuration where in all cases we indicated only the leading order term in the t/s expansion, assuming t/s ≪ 1.
If we had scalar particles, the amplitude would depend only on the Mandelstam invariants and we can write A 4 ( q) = A 4 (s, t = − q 2 ). However, in the case of particles with spin, the amplitude depends also on the polarization vectors contracted with the various momenta.
Let us assume that b in (3.1) is chosen along b = (b, 0, 0, · · ·), b = | b|. Then let us consider the integral over the first component of q, call it q 1 . Due to the exponential factor in (3.1) this integrand is suppressed if we give q 1 a positive imaginary part. Here we assumed that the amplitude does not grow exponentially. This is true if we consider particles with polynomial interactions.
Setting q 1 = iκ+real, we see that the exponential in (3.1) is suppressed in this region as e −κb , κ > 0. Thus we would get a vanishing result (in the large κ limit) unless we cross poles under this contour shift. In fact, we do cross poles. For example the pole at t = m 2 coming from the exchange of a particle of mass m in the t-channel leads to a pole at where q rest are the rest of the components of q except the first one. These are still real. The  [43,23] spacetimes has a direct physical meaning in the context of computing scattering amplitudes in the impact parameter representation.
The pole (3.3) gives a contribution to the amplitude of the form e −κ * b . For a massive particle this gives something going like e −mb for large mb, a Yukawa-like potential. 6 For a massless particle, the integral over the rest of the components of q produces an inverse In addition, factors of q which are contracted with the polarization tensors give derivatives with respect to b. In a theory which only contains a massless graviton we just get the massless graviton pole. 6 More precisely, after we integrate over the rest of the components of q we get the following expression (2π) It should be noted that in impact parameter representation, for non-zero b, we only get a contribution from the diagram that contains an on-shell particle in the t-channel. In particular, s-channel exchanges do not contribute. The reason is that the two incoming particles have to actually overlap in order to give rise to the intermediate particle in the s-channel. Similarly, a four point contact interaction does not contribute for the same reasons. In both of these remarks we used that we are looking at the tree level diagrams to leading order in the weak coupling expansion. At higher orders there can be other contributions. However, since we are at weak coupling, we can ignore them. In these paragraph, we have used that the interactions are local. Any non-local effect has to come from the propagation of a physical particle in the t-channel.  The polarization tensors of the external particles can be written as products of vectors ǫ µν = ǫ µ ǫ ν where for particles one and three we can write where e 1,3 are vectors in the purely transverse directions. The on-shell three-point functions will contain factors of the form The conclusion is that we can think of the polarizations of the external particles as contained effectively in the transverse space. In addition, when we contract them with the external states we get factors of q. These translate into derivatives with respect to ∂ b .
The final result from a massless pole is

The Possible Forms of Three-Point Functions in Various Theories
Given that the answer for the time delay depends on the on-shell three-point functions, it is useful to recall their possible structures. In four dimensions we can use the usual helicity basis. The Einstein-Hilbert gravity action gives rise to the + + − and − − + three-point functions. We can also have + + + and − − − structures which could come from (Riemann) 3 terms. There are two combinations, one of them being parity violating.
In higher dimensions, D > 4, we have three possible structures for the graviton threepoint functions, all parity preserving. They have the schematic structure, writing ǫ µν = (3.7) The first is the usual one coming from Einstein gravity, while the second can arise from the Lanczos-Gauss-Bonnet term (2.17). The third one can arise from a (Riemann) 3 term. 8 They can be viewed as products of the two possible structures that we can have for on-shell spin one particles. In a given theory, the total three-point function is given by a linear combination of these three answers where α 2 and α 4 are two parameters with dimension of (length) 2 and (length) 4 respectively.
Notice that we have an overall coupling, given by G, which we take to be parametrically smaller than the other two parameters.
In the high energy limit the three-point functions appearing in (3.6) simplify further and become In four dimensions we can replace one of the curvatures by R µνσδ → R µνσδ = ǫ µνργ R ργ σδ to obtain the parity violating term. This parity violating term gives rise to the three-point vertex ǫ µνδσ p δ which, despite appearances, is properly symmetric under exchange of any of the three particles. When this term is present the coefficient of the + + + amplitude is complex and the coefficient of the − − − amplitude is the complex conjugate.
for the three terms in (3.7). We used (3.4). The amplitude A I24 has a similar expression with p u → p v and 1, 3 → 2, 4. The reader can express these in terms of transverse traceless two index tensors by using the replacement rule e i e j → e ij in (3.9), where i, j are indices in the D − 2 transverse directions.
Note that the third structure in (3.7) or (3.9) is not allowed in a supersymmetric theory. This can be seen as follows. In D = 4 this structure gives rise to + + + and − − − amplitudes. However, by the methods described in [44] it is possible to show that supersymmetry implies that this structure should be set to zero. This is actually true in all dimensions. The reason is that we can start with a theory in D dimensions and consider external graviton three-point functions with four-dimensional kinematics. If we had a non-zero contribution from the third structure in D dimensions, then it would lead to a contribution to the + + + or − − − amplitudes in four-dimensional kinematics. Since the arguments in [44] are purely kinematical, based on the symmetries of the theory, then they also force the amplitudes to vanish. In conclusion, supersymmetry implies that α 4 = 0.
In the heterotic string we have a non-zero α 2 . And this is also true for compactifications of the string to D dimensions. Thus α 2 is compatible with half maximal supersymmetry.
With maximal supersymmetry, e.g. N = 8 in D = 4, we should also have α 2 = 0. This can be understood from the fact that the four point amplitude for the full supergravity multiplet is determined up to a unique function. Thus, there is no freedom to introduce the polarization dependent terms that would arise if we had the freedom to switch on α 2 .
Of course, in standard maximal supergravity α 2 is not present, therefore the only value consistent with maximal supersymmetry is α 2 = 0. As an aside, notice that this is also related to the fact that in four-dimensional N = 4 superconformal theories, the threepoint functions of the stress tensor are completely fixed by supersymmetry in terms of the two-point functions.
Between a graviton and two photons the number of possible three-point functions is One arising from the usual electrodynamics and the other from the second term in (2.13). Again this second structure is forbidden in a supersymmetric theory. As we did in (3.9), in the high energy limit we can write them as (3.11) We emphasize that we work here with the on-shell three-point functions, independently of the precise way we write the Lagrangian. This discussion depends only on on-shell threepoint functions and not on other contact terms. Any contact four point interaction does not give rise (at tree level) to the long range force at a non-zero value of the impact parameter.

Problems with Higher Derivative Corrections to the Three-Point Functions
We discussed above how the three-point functions give rise to the leading order expression for the phase shift δ( b, s) = sF ( b). If this result were exponentiated, as e iδ , then we could get a time advance problem similar to what we found for the shock waves. Here we would like to explain how to get a time advance problem without using the particular non-linear structure of shock waves. The goal is to present the problem in a way that depends only on very general principles. The intrinsic quantum uncertainty in v is ∆ q v. We have drawn a situation where there is a final time advance after going through all the shocks that is larger than the quantum uncertainty. In this figure we have neglected the delay of the u-localized particles.
First note that in order for time delay to be a problem we would like to find that the time delay ∆v = ∂ p 2,v δ is larger than the quantum mechanical uncertainty that is implicit in the definition of the wave packet for a particle of momentum p 2,v . This uncertainty is of the order of ∆ q v ∼ 1/p 2,v . Thus the figure of merit is here we used that δ is linear in s, and therefore linear in p 2,v (s ∝ p 1,u p 2,v ). Thus in order to see a problem, we expect that δ should be greater than one. On the other hand, the validity of perturbation theory suggests that δ should be much less than one.
In order to amplify the effect, we imagine that particle number two undergoes N successive instances of particle number one, see fig. 5. Through each instance it gets a small phase shift which leads to a factor in the out state of the form (1 + iδ) with small δ.
If we repeat this N times we get This is the total phase shift, and as explained in (3.12), we want N δ to be of order one. This can be achieved by taking N ∼ 1/δ. In addition, we would like to make sure that the approximations that we used remain valid. In particular, we have said that particle 2 remains localized at some distance b through the whole process. We can choose light-cone coordinates for the evolution of particle two so that u is time, then we have a non-relativistic problem with mass m ∼ p v . The spreading of the wavefunction during the time U that the whole process takes is ∆b ∼ U/p v . The time U that the process takes is determined as follows. We want to separate the N instances of the scattering process from each other so that we can view them as independent and we can use (3.13). The best we can localize each of the N particles of type 1 is by an amount ∆ q u ∼ 1/p u . Thus U = N/p u and then ∆b ∼ N s . We want ∆b ≪ b. This translates into the condition where we used that N ∼ 1/δ in order to have a problem. We see that the simultaneous validity of (3.14) and (3.13) can be achieved if sb 2 ≫ 1 and if s can also be increased so as to achieve 1 which can be done if δ grows with s. An additional issue is that we wanted to neglect the deflection angle. Then we can replace each of the 1 particles by a pair of particles localized in the transverse dimension at a distance b with respect to particle number 2 on opposite sides of its trajectory, as shown in fig. 2.
For spin zero particles, notice that if we had a gφ 3 vertex, then the scalar exchange In this case we cannot obey the conditions (3.15) to obtain a possible causality problem. In fact, (3.14) implies that g 2 /b D−6 > 1, which means that we are at strong coupling. Thus we see that a crucial feature that we used is that the phase shift increases as a function of s. In the case that we exchange a spin one field, δ is independent of s and there is also no causality issue. The reason is simple, the spin two field is effectively changing the metric and causal structure while the spin zero or one fields are not.
The conclusion is that we have justified the exponentiation (3.13) and thus the derivation of the time delay problem in a way that does not depend on the details of the shock wave solution. Let us emphasize that the preceding shock wave discussion was motivational, but the thought experiment that we have set up in this section does not require us to rely on the non-linear structure of the shock wave. It was all derived from the on-shell three-point functions plus certain assumptions about the locality of the theory that allowed us to view each shock as an independent event.
In fact, there are some cases where the shock wave computation does not give the right answer. For example, consider the pure gravity case, where δ = Gs/b D−4 . If the energy is large enough to form a black hole, then the time delay computed from the shock wave is not the correct description for the physics. We expect to form a black hole when the center-of-mass energy is such that the associated Schwarzschild radius r D−3 s = √ sG is larger than b [45,46,47]. It is easy to check that this never happens if (3.15) is obeyed.
It is not obeyed even if we consider the energy of the N particles that we used for the argument (with N ∼ 1/δ).
There is still one more complication that we need to deal with in order to make the argument clearer. In the shock wave discussion of section two, the spin of the particle creating the shock did not matter. However, with the modified three-point functions, the spins of the scattered particles can change. The full interaction is a spin dependent force which acts of both spins. It acts on both the spin of the left and the right moving particles (particle one and two, see fig. 3). I will be necessary for us to be able to fix the polarizations of particles one and three, see fig. 3. This can be achieved by replacing particle one, by a coherent state of particles. In this case, due to the usual Bose enhancement factor, particle three will have a larger probability of remaining with the same polarization. In this set up we can set the spins, or polarization vectors, of particles one and three to be the same. Or, more precisely, ǫ 3 = ǫ * 1 . Since we are at weak coupling we can consider a coherent state with a mean occupation number which is large enough for us to be able to neglect the spin flips but small enough that the total scattering amplitude is still small. In other words, the use of coherent states allows us to effectively select the final state for particle 3. More explicitly, say that we form a coherent state for the oscillator mode created by a † , e λa † |0 .
We could then have terms in the interaction Hamiltonian that leave this oscillator the same We see that for large λ the first term is enhanced. We have not indicated explicitly the initial and final states on which h 1 and h 2 act, since they involve other oscillators. Since we are at weak coupling, we can choose λ large enough so that the first term dominates relative to the second term, while still the whole process is in the weakly coupled approximation, or the total effect of the interaction Hamiltonian is small. Of course an alternative way to say this is that we are creating a classical background with the particles of type 1 in fig. 3 the terms that have a non-zero expectation value in this classical background dominate over the others. We want a classical background with small enough amplitude that we can still trust the leading order perturbative expansion of the interaction Hamiltonian.
The use of coherent states also allows us to select final states for particle 3 in fig. 3 with a a small momentum p v . Namely, we form a coherent state out of a superposition of particles with large momenta p 1,µ . We need a superposition since we need to localize this particles within the transverse plane to a location smaller than b. Thus we have some dispersion in the transverse momenta q. With a large p u component of the momentum, we then get a small momentum along the p v = q 2 4p u direction. Since we have a coherent state, the particle 3 is also taken out of this superposition and has the same range of values for the momentum. Therefore the total momentum transfer in the t-channel along the vdirection is very small. Then the kinematics chosen in (3.2) is representative for the process in question. This still allows a possibly large amount of momentum transfer along the u direction. We will discuss this in subsection 4.3 .
Another minor point, is that the phase shift δ represents a time delay for both particles, it affects both particle 1 and particle 2. So far we have been focusing only on the effects on particle 2. These coherent states also allow us to effectively select a final state for particles 3, so that we can focus more clearly on the time delay with which particle four emerges, see fig. 3.

Scattering of Gravitons in D > 4 Dimensions
It is also easy to argue that α 2 and α 4 structures lead to causality problem in D > 4.
To show this we consider the probe graviton that scatters off the coherent state. The phase shift takes in this case the following form (see appendix B for details) In order to find problems we will be choosing various polarizations for the particles.
For example, let us firt consider particle 1 with polarization e 1 xx = −e 1 yy and the other components equal to zero. Here x, y represent to two directions in the transverse plane. We call this the ⊕ polarization. We also choose e 3 = e 1 . We enforce this by sending a coherent state with this polarization. Now for particle 2 we can choose the same polarization or the crossed polarization, called ⊗, given by e 2 xy = e 2 yx = 1/ √ 2 and all other components equal to zero. Then we find the following. If b is along thex direction, then these two different polarizations for particle two do not mix as they go through the shock. They diagonalize the phase shift matrix. For small enough b the α 2 4 terms dominate since they are the most singular in the small b expansion. These terms have the form where we take the derivatives first and then set b = (b x , 0, · · · , 0). Where the terms in parenthesis are polynomials in D which are positive or negative definite. 9 Notice that the positivity of the first case is due to the following argument. If the polarizations of particle 2 and 4 are the same as those of 1 and 3, then the configuration is constrained by unitarity along the t-channel. 10 Therefore in this case we should get a strictly positive answer for the time delay for the contribution of any particle with a non-zero coupling. Since we obtained a negative time delay for the second case in (3.18), we conclude that α 4 should be set to zero unless new particles are present.
Once α 4 has been set to zero, we can discuss α 2 . In that case we can still choose the ⊗ xy polarization for particles 1 and 3 and the ⊗ yz polarizations for particles 2 and four.
We then focus on the terms proportional to α 2 2 since they are the dominant terms at small b (once we have set α 4 = 0). We then get Then we conclude that α 2 should also be set to zero unless new massive particles appear.

Scattering of Gravitons in Four Dimensions
Let us now discuss in more detail the four-dimensional case, D = 4, which is a bit special. First, need to take into account that the Einstein term produces a log(L/b) time delay, where L is an IR cutoff. Second we need to take into account the parity violating structure. The logarithm can be taken into account by modifying the causality criterion in the form suggested by Gao and Wald, who define it by comparing to the behavior of the same metric far away. In this way the log L term is eliminated and it is easy for a power law behavior produced by α 4 , which goes as 1/b 4 to overwhelm the logarithm. Also we will later repeat the computations for AdS 4 space and we will see that L → R AdS 4 . In conclusion, this is not a real issue.
Note that in four dimensions the α 2 structure is identically zero. This is related to the fact that the Gauss-Bonnet term becomes topological in four dimensions. Thus, we have only the Einstein term structure and the α 4 one. With four-dimensional kinematics, we can consider the situation with coherent states of particles of type 1. Let us choose the spin of these particles to be plus. Due to the coherent state considerations, the spin of particle 3 also needs to be plus (in the outgoing notation, or minus in the incoming notation). In other words the amplitude does not have a spin flip. In fact, without a spin flip the α 4 structure does not contribute in four dimensions. Thus in the vertex involving particles one, three, and the intermediate one, the only structure that contributes is the Einstein one. This contribution is effectively the same as the spin zero one. Then we can run an argument similar to the one above.
Let us now discuss the parity violating structure, together with the parity preserving one. By considering the coherent state for particle one (see fig. 3), with definite helicity (positive or negative) we get that only the Einstein structure contributes to the A 13I threepoint amplitude (see fig. 4). The we get the following matrix form for the phase shift for Here γ is the coefficient of the + + + amplitude and γ * the coefficient of the − − − one. Notice that these physically imply that the particle two undergoes a spin flip. We see that δ is a two by two matrix in the space of helicities. In (3.20), 1 represents the identity matrix in this two dimensional space. The matrix in (3.20) can be diagonalized by choosing the polarization directions Then we find that Thus we see that we have a causality problem for small enough b = |β|.

Fixing the Causality Problem by Adding Massive Particles
Let us discuss how to evade the causality problem that we found above. This problem can be evaded by adding new particles at the scale α 2 or α 4 . We will discuss the case of a weakly coupled theory where the problem should be fixed at tree level. This is indeed what happens in string theory, see appendix E. For a case involving loops see appendix C.
Let us first consider the corrections to the time delay due to new particles being exchanged in the t-channel. The new particles have to lead to a phase shift growing like s, or a higher power of s. Thus, we should add massive particles with spin J ≥ 2.
We will now argue that massive spin two particles do not help and that we need particles of higher spin. In particular, this will then rule out a solution involving Kaluza-Klein gravity, which would be a special example of the addition of massive spin two particles.
For this reason we will analyze it in detail.

Massive Spin Two Particles Do Not Fix the Problem in D = 4
Let us first discuss the four-dimensional case. Since the external states are massless spin two particles, the on-shell three-point vertices involve two massless particles and a massive spin two particle. Fig. 6: Consider the coupling of a massive spin two particle to two massless gravitons. Let us choose the kinematic configuration so that the massive particle decays into two massless gravitons along theẑ axis. The +− helicity configuration is impossible since the angular momentum along the z axis would be +4. The ++ configuration is allowed.
In four dimensions, we can label the massless particles by their helicities. An important result is that, in all incoming notation, the only non-zero amplitudes involve ++ or −− helicities for the massless particles. In particular the +− combinations are zero. The argument is essentially the same as in the Weinberg Witten theorem [48], or the statement that gravity does not have a local stress tensor operator. 11 Imagine that we have the 11 The matrix elements of the stress tensor operator between two on-shell graviton states is like the coupling to a massive spin two particle, where the square of the momentum of the stress tensor, q 2 corresponds to the mass of the massive spin two particle. massive spin two particle in its rest frame. We let it decay into two massless spin two particles. Let us suppose that the two decay products move in opposite directions along theẑ axis, see fig. 6. In the +− configuration the total sum of the spins of the decay products along theẑ axis is +4 or −4. However, the initial massive particle had spin at most ±2. Therefore a +− configuration is impossible. With a ++ or −− configuration there is not problem because the sum of the spins is zero. One can also write down explicitly the corresponding three-point amplitude where ǫ 1 µν = ǫ 1 µ ǫ 1 ν , and we used that the component of ǫ Iµν that contributes the largest factor of s in the sum over intermediate states is ǫ Iuu = 2. We have denoted the coupling by α 4 since it reduces to the α 4 structure in the massless limit. Here 1 and 3 are the massless particles. Of course, p 1 .p 3 is given by the mass of the massive particle. We see that this result is invariant under ǫ 1 → ǫ 1 + p 1 , and so on. In the second line of (4.1) we have written the three-point amplitude including the leading terms in the high energy limit. This is written in terms of the purely transverse polarization vectors (or tensors) introduced in (3.4). In D = 4 there is also a parity violating structure which we will not need to write explicitly.
If we now consider particles 1 and 3 in fig. 3 to be associated to a coherent state with definite spin, then we have no spin flip allowed and this coherent state does not couple to the massive spin two particles. Therefore in four dimensions the massive spin two particles cannot solve the problem, they simply do not couple to the type of source that we are considering. Note that it is important that the massless intermediate gravitons are still coupling to the 1-3 coherent state through the Einstein three-point function, and as discussed in section 3.5, it leads to a causality problem for particle two in fig. 3.
We can further show that the massive spin two particle with a coupling (4.1) by itself also leads to a causality problem and should therefore not be present. In fact, it will be useful for our later argument to understand this in more detail. For simplicity let us set to zero the parity violating massive structure. For the coherent state that involves particles one and three in fig. 3 we choose the ⊕ polarization with e 1 xx = −e 1 yy and the other components equal to zero, and the same for e 3 . Here x, y represent the two directions in the transverse plane. Now for particle 2 we can choose the same polarization or the crossed polarization, called ⊗, given by e 2 xy = e 2 yx = 1/ √ 2 and all other components equal to zero. Then we find the following. If b is along thex direction, then these two different polarizations for particle two do not mix as they go through the shock. They diagonalize the phase shift matrix. If the polarizations of particle 2 and 4 are the conjugate to those of 1 and 3, and reflected along b, then the configuration is constrained by unitarity along the t-channel to give a strictly positive answer for the contribution to the time delay of any particle with a non-zero coupling. On the other hand, if we average over all polarizations for particle 2, it is possible to see that the terms involving α 4 or α 4 (the massive particle contributions) all vanish. Thus, the contribution from the crossed polarization has to have the opposite sign. In other words, unitarity fixes a plus sign for the time delay for one polarization and this implies a negative sign for the other. Indeed, it is possible to see this explicitly by computing the massive particle contribution to both answers, which are where the first subindex of δ is the polarization of particles 1 and 3 and the second that of particles 2 and 4 in fig. 3. By acting with this operator explicitly one can see that it gives a negative answer in the second case. This is independent of the sign of α 4 . In fact, it is also negative for the contribution of the massless case when we have the α 4 structure on both sides. The full phase shift also has the general relativity contribution. Once we have a single massive particle, it is possible to go to a small enough b so that we overwhelm the positive contributions from the General Relativity vertices.
This shows that in a theory with up to spin two particles we cannot solve the causality problem that arises when α 4 is nonzero. In addition, we see any massive spin two particles, even if present, they should have α 4 = 0 in order not to cause further causality problems.

Massive Spin Two Particles Do Not Fix the Problem in D > 4
Now we now move on to a higher dimensional gravity theory, D > 4. The threepoint amplitudes for two gravitons and a massive spin two particle now have two possible structures, first the one in (4.1), which can be multiplied by a coefficient which we will still call α 4 . And a second one of the form where again ǫ I µν is the intermediate state polarization vector and we used that we only care about its ǫ Iuu = 2 component. We have introduced a new coefficient α 2 . In the second line we have indicated the form that it takes in the high energy limit. In the last line the polarization tensors are purely in the transverse directions and q is the momentum transfer.
We can first consider a setup with four-dimensional kinematics. Namely, we can consider particles 1 and 3 to be associated to a coherent state which is uniformly distributed along D − 4 of the original dimensions. In this case the problem is essentially four-dimensional and the three-point amplitudes involving α 2 and α 2 (both massless and massive) do not contribute. If we want to avoid causality problems, and without spin > 2 particles, we conclude that both α 4 and α 4 should be zero. The argument is the same as the one we presented in the four-dimensional discussion. Note that since we are getting to four dimensions by effectively dimensionally reducing the higher dimensional theory, then the parity violating four-dimensional structure does not arise.
We would now like to rule out the possibility of having contributions with non-zero α 2 . For this we will assume that α 4 and α 4 are zero, as shown by the previous argument.
Let us first consider the case D = 5. Now we have three transverse directions, let us call them x, y, z. We choose the polarizations of 1 and 3 to be of the ⊕ xy type.
Inserting this into (4.3) we see that this produces a factor of O xy = m 2 − ∂ 2 b x − ∂ 2 b y acting on the massive propagator, which is simply 1 b e −mb . For the particles 2 and 4 we choose the polarization ⊗ yz , which also produces a similar operator O yz . The final result has the form δ ⊕ xy ,⊗ yz ∼ Gs where we have set b = (b, 0, 0) after taking the derivatives. We see that for any sign of α 2 this produces a negative result. Furthermore, in the massless case, m = 0, this also gives the part of the graviton contribution proportional to α 2 2 . The graviton also contains other contributions involving the ordinary Einstein piece on both three-point functions, as well as a mixed term. These contributions behave like 1/b and α 2 /b 3 respectively. Thus, for small b, the graviton contribution involving α 2 2 dominates, since it goes as α 2 2 b −5 . Therefore, we conclude that if α 2 is non-zero, then by going to small enough b we get a causality problem.
Furthermore, this problem cannot be fixed by adding massive spin two particles.
For D > 5 one can run a similar argument. But of course, we could also set up the problem with five-dimensional kinematics. In other words, we choose a coherent state spread over D − 5 of the dimensions and we get the same as what we discussed above.
The final conclusion is that if we have extra structures in the graviton three-point function (if α 2 or α 4 are nonzero), they lead to a causality problem which cannot be fixed by adding massive particles with spins J ≤ 2.
With a risk of being repetitive, let us summarize the argument that rules out massive spin two particles. First we go to four-dimensional kinematics where the massless or massive couplings proportional to α 2 or α 2 do not contribute and we rule out both α 4 and the similar coupling α 4 to massive intermediate gravitons. Then we go to five (or higher) dimensional kinematics and we rule out both α 2 and α 2 . In particular, notice that, even in the case of ordinary Einstein gravity, with α 2 = α 4 = 0, we have ruled out tree level couplings to massive gravitons (or massive spin two particles).  Fig. 7: When the particles scatter, the graviton can become another massive particle, here labeled by X.

Exciting the Graviton Into New Particles
In the above discussion we ignored the possibility of exciting a graviton when it passes through the shock and transforming it into a new state. 12 In this section we discuss this possibility and conclude that it cannot fix the problem.
Since the available energy is large, compared to 1/b, it is possible to turn the incoming graviton into an outgoing massive particle, let us call it X. If we use coherent states for particles 1 and 3 in fig. 3, then we suppress the processes where particle 3 becomes a new massive particle and we only have to worry about the possibility of particle 2 turning into this new massive particle. This can happen even if the mass of the new particle, call it X, is much larger than b −1 , but smaller than √ s. The reason is that there can be some p u energy transfer from particles 1 and 3 to particle 4 in fig. 3.
Let us view the process of the graviton 2 passing through the shock as a signal transmission problem. Focusing on the v dependence of the signal we can say that the outsignal, given an in-signal must be causal. Namely, if the in-signal vanishes for v < 0, then the out-signal must vanish for v < 0. In Fourier space these signals are related by Here we are using that particles 1 and 3 carry a negligible amount of p v , since we are using v−translation invariance. Of course, if we have physical particles, we cannot localize them sharply in v because they only have positive frequencies.
In order to obtain a sharp causality bound we need to invoke the vanishing of the field commutators, [φ out (v), ∂ v φ in (v ′ )] = 0 for v < v ′ (we put the derivative to remove possible zero mode issues). 13 As reviewed in appendix D , causality implies that S(ω) is analytic in the upper half plane. In addition, the fact that we can produce other particles can only make the strength of the graviton signal in the future smaller. This, in turn, implies that the S matrix element for graviton going into graviton, call it S gg (ω) should be be smaller than one in the upper half plane, See appendix D for further discussion. 12 In string theory these are called tidal excitations of the string. 13 In a theory of gravity we do not have local field operators. However, we can imagine defining such operators in the asymptotic past and future. More precisely, in order to run into the causality problems we need to put them far enough to the past and future that we can neglect the change in the spacetime metric but close enough so that the quantum mechanically dictated momentum p u of the signal particles does not wash out the possible time advance. Here we will assume that it is possible to do this.
In situations where we have some time advance for the graviton, we are getting an infinitesimal matrix element of the form with ∆v < 0. Then if we set ω → iγ, with γ > 0 we get S gg = 1 − γ∆v which is bigger than one in the upper half plane. Note that we do not need to go to very large values of γ to obtain a violation, we only need p v or γ, to be large enough so that this impact parameter description is good enough.
In this presentation of the argument, it is clear that adding extra particles as possible extra final states does not help. We need to modify S gg . In other words, the transformation of the graviton to X is irrelevant for this argument because we are considering the gravitongraviton S matrix element. The transformation to X and then back to the graviton can contribute to this matrix element at higher orders in G. But, this cannot fix the problem we run into (4.6), which is of first order in G.

Massive Higher Spin Particles Can Solve the Problem
Now we consider the exchange of a massive spin J > 2 particle in the t-channel. Its contribution to the phase shift will rise with energy like Gs J−1 and at high energies it will dominate over the graviton contribution. This can happen even in the regime that the theory is weakly coupled. If we have a single contribution of this type, we also run into a problem. The problem is the following. We can think of the propagation of the particle number 2 as signal transmission problem where time is v. In other words, we start with a signal f in (v) which vanishes for v < 0, then the out-signal f out (v) should be zero for v < 0. In addition we want that the total norm of the out wave should not grow. From these two conditions we can deduce that the S(ω) matrix as a function of the "energy", properties. However, a particle of spin J > 2 leads to a contribution S ∝ 1 + iGs J−1 + · · · which becomes bigger than one in some regions of the upper half complex s plane. Notice that the problem arises already at weak coupling, for a small value for Gs J−1 .
Thus, a finite number of higher spin particles does not fix the problem. In fact, it generates problems of its own. On the other hand, an infinite number of particles with higher spin can solve the problem. An example is string theory. This problem has been discussed extensively in the classic papers by Amati, Ciafaloni and Veneziano [49,50,51,52,53] (see also [54]). In fact, the amplitude Reggeizes for large s small t (sα ′ ≫ 1, tα ′ ≪ 1). This expression has a cut in the s-channel, due to the creation of physical states along the s-channel. These are simply the massive closed string that are present along the s-channel. For spacelike t, t < 0, we see that this effectively leads to a phase shift that decreases faster than s at large s. Taking (4.7) and transforming to the impact parameter representation we find that for b 2 ≪ α ′ log s we get a behavior [51] δ ∼ Pol Gs where Pol = 1 + α 2 ǫ.∂ b ǫ.∂ b + · · · is the part coming from the polarization tensors, which includes the new structures in the three-point functions. 14 This is indeed compatible with causality. We also get a large imaginary part that is reflecting the fact that we are creating strings along the s-channel. Notice that we had argued before that in a local theory we expect that by going to impact parameter space we can suppress tree level s-channel processes. This is not true in string theory, which contains extended objects.
Furthermore, since their size increases with mass logarithmically, we see that at high energies their effects appear at b 2 ∼ α ′ log(sα ′ ) rather than the more naive expectation of b 2 ∼ α ′ . This justifies the small t expansion used in (4.7). Further aspects of the string case are discussed in appendix E.
The conclusion is that an infinite number of higher spin particles can solve the problem.
We need a tower of particles with increasing spins and intricate relations between them so that the expansion can be resummed into an amplitude that does not have a problem.
Besides string theory, we do not know if there are other ways of doing this.

Compositeness and the Extra Structures for Graviton Scattering
In this subsection, with the string theory case as an inspiration, we make some remarks about the extra structures for the graviton scattering.
Imagine that the graviton has a composite structure. 15 Let us imagine that the graviton is given by a pair of particles which are in a bound state given by a wavefunction ψ(r) where r is the relative distance. We further assume that these particles scatter through the shock via the usual general relativity three-point functions. However, since the two particles feel slightly different forces we find that the total scattering amplitude will have the form where δ 0 is the general relativistic expression (2.11), and the subindex ǫ indicates the spin of the graviton. Notice that since the Laplacian ∇ 2 b δ 0 = 0, the only terms that can contribute to the second part are those which are not rotationally invariant in the relative coordinates. These are possible because the graviton spin or polarization ǫ. Notice that this is a simple argument for the presence of extra structures in the graviton three-point function. Even though we motivated this with a graviton composed with two particles, the same final formula works if the graviton is made out of many more elementary constituents as it happens in string theory, when it is a string. In any case, the size of the new structure, α 2 , due to compositeness, is of order α 2 ∼ r 2 s where r s is the typical size of the graviton. Given that the bound state has this typical size, then we also expect that it can be excited to other states with masses m 2 ∼ 1/r 2 s . Indeed, by imposing the causality constraint, we found that there should be new particles with masses of this order of magnitude.

Anti-de Sitter Discussion
The case of asymptotically AdS space is very similar to the asymptotically flat space one. The causal structure is defined by the causal structure of the boundary. We then require that signals that go through the bulk cannot go faster than signals that remain on the boundary. As argued in [55,26], general relativity with the null energy condition implies that this is obeyed.
In terms of the dual CFT, this is just the statement that CFT observers cannot exchange information faster than light. Equivalently boundary CFT operators commute outside the boundary light-cone. 15 Note that we cannot make the graviton as a zero energy bound state in a local relativistic quantum field theory that contains a stress tensor operator [48].

Motivation: The Emergence of Bulk Locality Should Happen in the Classical Theory
In this subsection we discuss in more detail the AdS considerations that motivated the present paper.
We expect that the dual of a large N gauge theory should be a weakly coupled string theory with coupling g s ∼ 1/N . This should be true both at weak and strong 't Hooft coupling. As we increase the 't Hooft coupling we are supposed to interpolate between a Vasiliev-like theory and an ordinary Einstein-like theory at strong 't Hooft coupling. This whole interpolation happens within the classical string theory in the bulk. Of course, in the ordinary Einstein description we see a local theory in the bulk. Thus the emergence of bulk locality is something that should be contained within classical string theory. It is for this reason that it is interesting to understand the constraints of tree level interactions of gravitons and the link between the masses of the higher spin particles and the size of the corrections to Einstein gravity. Here we attempted to link them via the causality considerations for the simplest gravitational interactions. Given the interest of the AdS case, we will discuss in more detail some of its features.

Statement of AdS Causality
In asymptotically AdS gravitational theories the causal structure given by the Here the function h is only constrained by the Laplace equation in the transverse hyperbolic which can be equivalently written as where f = h z . In appendix F we give an argument that this is a solution, to all orders in the derivative expansion.
We can also a plane wave with a delta function source so that instead of (5.3) we write where the RHS corresponds to an insertion of a delta-function source in the hyperbolic space in (5.4). The Green's function in the hyperbolic space is well-known, so that we get where y 0 and z 0 are the coordinates of the source in the bulk, and It can be checked that P u is also the total momentum from the boundary point of view by integrating the boundary stress tensor (read off from the small z expansion of this metric) on the boundary. For example, in AdS 4 and AdS 5 (5.6) takes the following form Let us understand better the symmetries of the problem. The shock wave is localized around u = 0 and is probed by a particle which is localized in v. The role of the transverse plane in flat space is played here by the transverse H D−2 . It is convenient to think of the probe crossing this hyperbolic space at the center. 16 The shock centered at ( y 0 , z 0 ) has a number of Killing vectors that depend on f (u). General properties of the AdS shock wave and its different limits are considered in appendix F. In particular, when the center of the shock goes to the boundary z 0 → 0 the problem becomes very similar to the one arising in the computation of energy correlators [14], whereas in the limit z 0 → ∞ it reduces to the setup used in [60] to study causality.
Our formulas will reduce to the ones considered before in those limits.

The Effect of Higher Derivative Interactions on Particles with Spin
Now we would like to consider different type of probes and compute the time delay for them. We start with a simple example of a scalar probe and then move to the case of particles of spin one and two.
If we consider a minimally coupled scalar in the shock wave background its equation of motion takes the form (5.10) 16 From the CFT point of view we can create such a probe by acting with the operator of given energy and zero momentum in the AdS Poincare coordinates for which u = 0 is the future null infinity as explained in [14]. In [14] terms we are working here in the y-coordinates, while the operator with given momentum is inserted in the x-coordinates. In pure AdS case isometries of H D−2 at fixed u = 0 correspond to the usual Lorentz symmetry group in the x-coordinates of [14].
In our setup we are interested in corrections to this equation which are second order in derivatives. 17 Considering terms yielding two derivative equations of motion for φ we have to consider terms like where the tensor H is made from the background metric, Riemann and covariant derivatives. However one can check that there is no two index symmetric tensor that is not vanishing on-shell [59]. Of course, this statement is equivalent to the uniqueness of the scalar-scalar-graviton three-point vertex. In the high energy limit we get similar to flat space which produces the time delay reproducing the flat space computation for small ρ. We assumed that the perturbation crosses the shock at z = 1 and y = 0.
For the gauge boson we imagine at the level of two derivative the following equation where H is built out of the Riemann tensor and its covariant derivatives. Using the properties of the background discussed above (we defer the details to appendix F) one can check that the only term that we can have is a correction analogous to that appearing in the case of flat space whereŘ reduces to the Weyl tensor on-shell (see appendix F for details).
If we compute the time delay using the same action (5.15), considering that each mode corresponds to a different constant polarization, ǫ i , we get The final result is very similar to the one obtained in flat space. The only different is that the polarization dependent delay is slightly more complicated. The flat space result 17 Since we need derivatives to bring down large factors of momentum.
(2.16) is reproduced by considering ρ → 0 limit, whereas the energy correlator constrained is recovered in the limit ρ → 1.
Similarly, in the case of gravity we are interested in the most general form of the second order equations. We choose to parameterize the equations of motions for perturbations as follows where the parameters α i are in units of the AdS radius R AdS that we set to one. In The time delay for these equations of motion is then given by where n is a vector pointing from the center of the shock to the probe particle, and ǫ is the polarization of the probe particle. These time delays can become negative for small enough ρ if α 2 or α 4 are non-zero.
These results can be also reproduced using slightly different method of evaluating the on-shell action in an explicit gravitational theories in the shock wave background like Lovelock or quasi-topological theories (see e.g. [63,64]).
In the limit ρ → 0 these constraints reproduce the flat space analysis of section B.1 with ρ = b 2 . In the ρ → 1 limit the above result reproduces constraints discussed in the past. If we take the limit ρ → 1 by taking the shock center to the boundary z 0 → 0 we recover energy correlator computation [14]. If we on the other hand consider ρ ∼ 1 by taking the shock center to the horizon z 0 → ∞ we recover the shock wave discussed by Hofman [60]. 18 There is also the possibility of considering more than two derivatives acting on the perturbation as ∇ n δR. These contributions in general change the number of degrees of freedom of the theory and will not be considered here.

Implications for a, c in Theories with Large Operator Dimensions
Imagine an abstract CFT 4 with large N ≫ 1 and large gap ∆ gap ≫ 1, where ∆ gap is the dimension of the lightest higher spin single trace operator. This theory is described by a gravitational theory in the bulk with potentially some higher derivative corrections. String theory inspired intuition suggests that higher derivative corrections should be suppressed by the 1 ∆ gap factor. Our argument shows that this is indeed the case for the simplest corrections, which are the ones affecting the three-point function of stress tensor. Indeed, as we showed in (5.18), we run into a potential problems with causality at impact parameters ρ c ∼ (α 2 ) 1/2 , (α 4 ) 1/4 . Since this can only be fixed by higher spin particles, we conclude that ∆ gap has to be small enough so that it can start correcting the amplitude before we run into this problem.
For ∆ gap ≫ 1 then the relevant impact parameters are such that we can approximate the formulas by the flat space limit. Thus, we get the bound of the type where stands for some numerical coefficient that we cannot fix using our simple analysis. It will be very interesting to find an independent field theoretic argument that leads to the bounds of the type (5.19) or (5.20). It would also be nice to find the precise numerical coefficients in (5.19) and (5.20). In the supersymmetric case, a − c was argued to control asymptotic density of BPS operators in [65]. It would be very interesting to understand if there is any relationship between their work and our analysis.  Let us briefly review the results in [56,61,62]. The basic idea is the following: the state created by O(∂ 2 ) n ∂ µ 1 ...∂ µ j O for large n and large j can be thought of as two highly energetic particles that follow null geodesics in AdS, see fig. 8.

Implications for Dimensions of Double Trace Operators
For two geodesics that are characterized by total energy E and spin J the minimal separation is achieved in global coordinates at where we matched energy and spin of the pair of particles to the ones of the double trace operator J = j, E = 2∆ + 2n + j and used that n, s ≫ 1, with ∆ also of order one.
Thus, we see that probing distances much smaller than AdS radius (ρ ≪ 1) corresponds to considering operators with n ≫ j. On the other hand n ≪ j corresponds to scattering at very large impact parameters. The Mandelstam variable s is given by and the relation to the anomalous dimensions is that Let us consider different limits of this formula. First, consider very large impact parameters ρ ∼ 1 or j ≫ n. In this limit we get which is in agreement with the general results derived using the crossing equation [66,67,68].
In the opposite limitof small impact parameters scattering, n ≫ j and j n > 1 ∆ gap , we have We have several comments to add to this story. First, these results should be universal and applicable to generic CFTs with large N and large gap.
To write the answer in an abstract form we have to use the relation of G to the two-point function of stress tensors which is well-known and is roughly G ∼ 1 c T . It means that it should be possible to derive them using crossing equation which still be dominated by the stress tensor exchange.
Probably the relevant limit is z → 0,z → 1 with z 1−z being fixed. It would be nice to reproduce the formulas above using crossing equations.
Second, we see that causality, or positivity of the time delay, implies the constraint γ(n, s) < 0, which generalizes the ones that were previously known [69,67,68] for asymptotically large s ≫ n. Again it would be very interesting to understand how to prove these constraints purely from the field theory point of view.
Third, we see that considering double trace operators of the type T µν (∂ 2 ) n ∂ µ 1 ...∂ µ j T ρσ we get new structures due to the dependence on polarization which potentially lead to causality violations and bounds (5.19), (5.20). Of course, the same is true about the double trace operators that involve the conserved current J µ . It will be very interesting to understand them from the purely CFT viewpoint. Note that in the scattering process the polarizations of particles 3 and 4 in fig. 3 can change relative to those of particles 2 and 4, so that the phase shift is an operator that acts on this space of polarization tensors. Both t-channel unitarity, as well as our considerations, constrain only some matrix elements of this general matrix. While we leave the general case for the future, we note that, in some cases, we can ensure that the polarizations of 3 and 4 do not change by using conservation of angular momentum along the directions orthogonal to the impact parameter direction.
In these cases our considerations apply and we can say that the anomalous dimensions of the corresponding double trace operators should be negative. The positivity statement applies to the part of the phase shift that grows with the Mandelstam invariant s, which translate into the growth with n via (5.23).
However, in our case, this positivity requirement is not obvious from the CFT point of view. It would be nice to see whether this is a general requirement or is one that is present only in theories with a local bulk dual.

Wormholes and Time Advances
General relativity has Lorentzian wormhole solutions that join far away points by short Einstein-Rosen bridges. The simplest configuration is the maximally extended Schwarzschild solution interpreted as an approximation to the metric of two distant black holes which share a single interior. As discussed in [28], these solutions do not lead to a violation of causality in the ambient space because it is not possible to send signals through the wormhole [70]. delay advance Fig. 9: We consider a Lorentzian wormhole configuration that is described, near each wormhole, by the maximally extended Schwarzschild solution. We send a (green) particle from the left very close to the past horizon. We then send a (purple) particle from the right. (a) If this particle gets a time delay, it will fall into the singularity. (b) If the particle gets a time advance, then it can make it out of the other black hole and we would have a way of sending signals through the wormhole.
Here the blue lines represent the average position of the horizon of the black hole, defined by null lines that are very far away from the two particles we send in. We assume that the impact parameter is much smaller than the Schwarzschild radius.
The inability to send signals through the wormhole depends crucially on the fact that we have a Shapiro time delay as opposed to a time advance. For example, if one sends a fast moving particle from the left side, then a particle send from the right will suffer a time delay that will make it go into the singularity, see e.g. [33,71] . However, if that particle were to suffer a time advance, as opposed to a time delay, then it would be able to go through the wormhole and we would have a violation of causality, see fig. 9. Note that we can make the wormhole big enough that we can neglect the higher derivative corrections in the description of the background metric. It can also be big enough that we can neglect the backreaction of the two particles. So we are considering a situation where the impact parameter b ≪(Schwarzschild radius). In the D = 4 case, the Schwarzschild radius acts at the IR cutoff of the logarithm.
This impossibility of sending signals is crucial for interpreting the wormhole as an EPR state of two disconnected systems [29,30,31].

Cosmological Applications
The gravity wave non-gaussianities produced by inflation are a direct measure of the graviton three-point vertex during inflation [17,20]. To leading order in the slow roll approximation we can do the computation in de Sitter space. The symmetries of de Sitter imply that only two different parity preserving structures are possible. These correspond to the two parity preserving structures that we have in four-dimensional flat space. One is the one produced by the Einstein action and the other can be produced by a term in the action of the form M 2 pl α 4 R 3 , where R is a Riemann tensor (not the Ricci tensor). The relative size between the two types of gravity wave non-gaussianity is proportional to where H is the Hubble scale during inflation. Of course, both are small compared to the two point function, 3point 2point 3/2 ∼ H M pl . See [20] for the explicit expressions. Thus, if the gravity wave three-point function was measured and it was found that this exotic new structure is present at a level comparable to the Einstein one, then one concludes that α 4 is of the order of the Hubble scale. The considerations in this paper imply that there should also be new particles with spins J > 2 with masses comparable to the Hubble scale during inflation. Thus, this would be an indirect evidence for string theory during inflation.
Note that α 4 H 4 ∼ 1 implies that supersymmetry had to be broken at the Planck scale and not at a lower scale, since the + + + and − − − structures are forbidden by supersymmetry. 19 Let us be a bit more explicit about this point. If the short distance theory is supersymmetric, then the field theory Lagrangian does not contain the couplings giving rise to α 4 . Now, since supersymmetry is broken, this three-point function could arise from integrating out massive particles. These are expected to contribute to α 4 as which is very small. Where, to maximize the effect, we assumed that the masses of the bosons and fermions, as well as their differences, are of order H. We then see from (7.1) and (7.2) that in this supersymmetric scenario the contributions are very small. Thus, in order for the right hand side of (7.1) to be of order unity (or say a few percent) the supersymmetry should be broken at the Planck scale (during inflation) so that the threepoint vertex is present in the original classical theory. Notice that most of the string inflation models do not predict a large α 4 since they are based on compactifications of the 19 We thank Nima Arkani-Hamed for emphasizing this to us.
ten dimensional superstring. It should be a model where the string length is comparable to the Hubble radius and with a very weak coupling to account for the small experimental upper bound for H/M pl [72].
Note that if one imagines that inflation is given a dual description in the spirit of dS/CFT and the dual field theory is weakly coupled, then one expects that This is what happens in the Vasiliev theory [73]. Of course, this theory also contains massless higher spin particles. It is also not suitable for building an inflationary model because the scalar does not appear to obey the slow roll conditions.
If the gravity waves produced by inflation are as large as to explain the signal seen by BICEP2 [74], then probing the gravity wave three-point functions might be possible (with a lot of optimism!). 20

Conclusions
In this paper, we studied causality constraints on higher derivative corrections to the graviton three-point function. We considered a weakly coupled theory and studied higher derivative corrections which are important before the theory becomes strongly coupled.
These are higher derivative corrections that arise in the classical regime of the theory.
The constraints arise from a thought experiment where we scatter two gravitons at relatively high energy and fixed impact parameter. The energy is high compared to the inverse of the impact parameter but low compared to the scale where the theory becomes strongly coupled. More explicitly, we have the very small overall coupling G and we consider corrections to the three-point functions which scale as powers of αp 2 relative to the Einstein one. The three-point amplitudes are very small because G is very small. But α is a fixed quantity and we look at impact parameters of the order of b 2 ∼ α. We found that when the impact parameter b 2 ∼ α, then we see a causality violation. In this impact parameter representation, and in the field theory regime (without higher spin particles), the time delay comes from the singularities in the t-channel, which is simply a pole at t = 0 for the massless theory. More precisely it comes from the part that is quadratic in s of its residue. In other words, terms going like s 2 /t at t = 0. It is important that while t = 0, the momentum transfer itself is non-zero. 21 The overall sign of the time delay, then 20 At this time, there are alternative explanations for this signal [75,76], so we might have to wait till the dust settles. 21 It is a null, non-zero momentum.
depends on the contractions of the polarization tensors of the external particles with the momentum transfer in the t-channel.
We have argued that this type of tree level causality violation can only be fixed, at tree level, by higher spin particles at a mass scale m 2 ∼ 1/α. In string theory, this issue is fixed because the amplitude Reggeizes. Namely, it has a behavior s 2+ α ′ t 2 for large s and fixed t. This is due to extended strings being exchanged in the s-channel. If the amplitude Reggeizes, then corrections appear at a scale b 2 ∼ α ′ log(sα ′ ). Due to the presence of the logarithm we did not find a sharp bound between the corrections to the graviton three point amplitude, α, and the Regge slope α ′ .
We should stress that in this discussion we have assumed that we have a weakly coupled gravitational theory. We have also assumed the notion of asymptotic causality which says that the causal structured determined by the far away regions of spacetime cannot be violated by its interior regions.
The analysis in this paper was also motivated by trying to understand better the AdS/CFT correspondence. In particular, if we consider large N gauge theories we know that we have a weakly coupled theory in the bulk. However, we do not know under what conditions that weakly coupled theory is a local in the bulk. It is clear that the absence of light massive higher spin states is a requirement. Here we have tried to address the question of whether it is sufficient. We have only studied the simplest possible correction to gravity, its three-point function. We have argued that, as long as higher spin particles are very massive, there cannot be higher derivative corrections to the three-point functions.
Previous discussions argued against such corrections by saying that they would make the theory strongly coupled at energies that are lower than the Planck scale [15,16], but still parametrically larger than the scale of the corrections. 22 Here we have strengthened the bound by linking the scale of corrections to the appearance of new particles at the same scale.
As a more concrete statement, we are linking the values of the constants appearing in the stress tensor three-point functions to the dimensions of the lightest particles with higher spins, J > 2. In other words, a−c c 1 ∆ 2 gap . Unfortunately, we could not determine the precise numerical constant in this inequality. 22 In [15,16], or in talks referring to those papers, they impose the bound α Using the results in [56,61,62], we can link the time delay for a high energy scattering process in the bulk to the anomalous dimensions of certain double trace operators. These double trace operators have the rough form T ∂ j + (∂ 2 ) n T . They have both relative spin and relative radial excitations. The anomalous dimensions are γ(n, j) ∼ −δ(s, b)/π, where the values of b and s are given in terms of j, n in (5.22)(5.23). The requirement that the time delay is positive leads to the statement that the anomalous dimensions for some of these operators should be negative.
For the de Sitter case, this analysis has potentially interesting phenomenological applications. If the gravity wave three-point functions were measured, we expect to see the structure predicted by Einstein theory. However a new structure is also possible. This new structure is the only one allowed by the approximate scale and conformal invariance of the inflationary phase. If such new structure was found with a strength comparable to the Einstein one, then it would be a direct signal of dramatically new physics at the Hubble scale: a tower of higher spin particles. Which is a rather drastic departure from ordinary field theory at the inflationary scales. It is not clear how likely this inflationary scenario is in the space of possible inflationary theories.

Open Problems
It would be nice to derive these constraints in a more direct way. If one understood directly the constraints of unitarity and causality at the level of the four point function, then one would not need to resort to the exponentiation argument we discussed in section 3.3. Furthermore, it might lead to sharper bounds that include numerical factors.
Our discussion of massive intermediate particles in mixed representations was not complete. 23 These are representations that have maximal spin two in the uv-plane, but have additional indices in the other directions (see appendix H). We suspect that a finite number of these cannot solve the causality problem, but we did not prove it.
In the AdS case, it would be nice to derive the constraints from the conformal bootstrap point of view. This is in the spirit of [15], but it would involve the stress tensor as an external state. One of the main messages from this paper is the importance of spin in 23 These are not present in D = 4, so that the existence of an infinite tower of higher spin states is clear in D = 4, or if α 4 = 0 in higher dimensions. In higher dimensional theories with only α 4 = 0 but non-zero α 2 , in order to establish that the tower is really infinite we need to rule out the possibility that the causality problem is fixed with a finite number of mixed representations.
We leave this to the future. order to derive constraints. The structure constants (or three-point functions) of operators with spin are very numerous but, as we have shown, there can be powerful constraints on them. These constraints are not so easily seen when we scatter external operators with no spin. Notice that, even the simplest bounds for a and c discussed in [14], which should be valid for arbitrary CFTs, have not been derived from the conformal bootstrap approach.
It would be nice to further constrain the interactions of all the higher spin particles and derive the general structure of the tree level theory. This is the program that was pursued in the sixties and that led to string theory. However, we would like to know how unique string theory is. Methods developed to tackle this problem might also be useful for analyzing large N gauge theories such as large N QCD.
It would also be nice to see whether in the de Sitter context there is a sharp bound for the gravity wave three-point correlators analogous to the one in [14] for AdS correlators.
Our de Sitter discussion assumed the local gravity description and a locally flat space discussion in the bulk, so it applies most clearly when α 4 H 4 is somewhat smaller than one, but still of order one compared with H/M pl .

Appendix A. Shapiro Time Delay
The physical effect discussed in the paper is known in general relativity as the Shapiro time delay [1]. The usual setup to discuss the Shapiro time delay is to consider propagation of light near a massive body (a star or a planet). Here we would like to consider a slight variation of it by considering propagation of light between two massive bodies. We consider the masses to be equal and consider the geodesic that is equally separated from each of them, such that the deflection is absent (each of the masses bends the trajectory in the opposite direction such that the net deflection is zero). The time delay on the other hand accumulates. Recall that the Schwarzschild metric takes the form Fig. 10: We consider the metric produced by two massive sources at distance 2b. We then send a particle between them and measure the time delay. There is no deflection angle, but there is a non-zero time delay. We do the computation to leading order in the r S /b expansion, by simply consider a linear superposition of the two metrics.
We are interested in the metric produced by the superposition of two equal masses.
In the Schwarzschild coordinates to first order in the mass, or r S , we get Let us now consider two masses separated along the direction x 1 , one at Then we consider a probe particle moving along the direction x D−1 = z. By symmetry, it will stay at x 1 = ... = x D−2 = 0 if it starts there with velocity along the z direction. We have that r = √ b 2 + z 2 for both masses. We also find that dr = dzz/r. We get Then the time delay is We now do the same for a particle moving at a velocity v. We find The result of doing the integral over z is the same as in (A.4) multiplied by a factor of We see that for slow velocities, we indeed find a negative time delay (time advance) proportional to 1/v 3 . This is time advance relative to the particle moving with the same velocity v in flat space (not relative to a particle moving at the speed of light!).
We could have repeated the computation in different coordinate system. For example, we can consider the computation in the so-called isotropic coordinates given by One can easily check that the time delay for the geodesic in this coordinates coincides with the one computed in Schwarzschild coordinates to leading order in r S .

Appendix B. Three-Point Amplitudes and Their Sums
In this appendix we recollect different three-point amplitudes that involve a graviton and two other particles and present different polarization sums that appear in the time delay. In particular cases we reproduce the shock wave computations but the results obtained using on-shell amplitudes are much more general and are valid in any theory with given on-shell three-point amplitudes.
Let us be more explicit on the kinematics we are interested in. As defined in the bulk of the paper we consider the following kinematics We define the polarization tensors as follows

(B.2)
An important point is that all ǫ i .p j are of order 1 and, thus, are sub-leading in the high energy limit compared to powers of s. Thus, in all our computations we can think of polarization tensors as being purely transverse, see (3.5).
We focus on the massless scalar, vector and graviton three-point amplitudes which are parameterized as follows where we used the ǫ µν → ǫ µ ǫ ν form of the polarization tensor for the graviton . Using these three-point amplitudes we can compute the time delay using (3.1) in the high energy limit. In the high energy limit the relevant part of the sum over graviton polarization tensors takes a very simple form so that it leads to factors of s 2 when we contract with the p 1 or p 3 momenta from the left side and with p 2 or p 4 momenta on the right side, see fig. 4. Note that the large components of the external momenta are transverse to the momentum q of the intermediate particle, The results are the following where all of the operators are acting on the propagator 1/b D−4 . To reproduce the usual shock wave computations only the first three formulas are relevant. In the case of electrodynamics matching with (2.16) is manifest. In the case of Gauss-Bonnet theory we have from (2.19) α 2 = λ GB 4 , α 4 = 0. Consider now the coupling of the graviton to a massive spin two particle which is a relevant amplitude for the discussion in the bulk of the paper. We get (B.6) Notice that instead of three structures that are present in the massless amplitude we have only two.
For the coupling to spin four particle or higher we can have three structures [77]. They take the same form as above with extra indices of particle of spin J being contracted with momenta. An additional structure takes the form Similarly, we can write down the contribution to the time delay of some higher spin particle the difference being that the sum of three-point amplitudes give s J and we have an extra structure in the three-point amplitude for external gravitons which is an analogous to the Einstein one.
The formulas above are valid in D > 4, whereas in D = 4 α 2 and α 2 structures above are absent 24 but instead we have a parity odd structure. In D ≥ 5 parity odd structures are absent in the class of amplitudes considered above.

B.1. Scattering of a Scalar and a Graviton
As an example let us reproduce the computation for the graviton that we did using the shock wave. It corresponds to scattering from the energetic scalar particle (we could have considered graviton-graviton scattering as well if we average of all polarizations of gravitions 1 and 3). 25 As an example let us reproduce the computation for the graviton that we did using the shock wave. It corresponds to scattering from the energetic scalar particle (we could have considered graviton-graviton scattering as well).
we reproduces the computation from the previous section (2.19).
Let us now study the constraints that follow from the positivity of the time delay.
Introducing new variables 24 They are identically zero. 25 More precisely we set polarizations one and three to be equal and then we sum over all of them. Physically we are considering a sequence of coherent state with the various alternative polarizations. This makes the α 2 and α 4 contributions vanish in the A 13I part of the amplitude.
we can write the phase shift in the form familiar from the study of the energy correlators which coincides with the formula (3.6) in [78,41]. 26 Here n = b | b| and e is the graviton polarization of particles two and four. Thus, positivity constraints from causality are identical to the ones obtained in their analysis with identification of parameters as above (B.9), namely So we get the bounds on α 2 and α 4 depending for how small a b we can trust the computation. If the computation is trustworthy for arbitrarily small b we are forced to set α 2 and α 4 to zero.

Appendix C. The QED case
Note that the action (2.13) also arises in QED (quantum electrodynamics) after we integrate out the massive electron [79]. In that caseα 2 ∝ e 2 m 2 . This is a one loop effect, suppressed by the coupling e 2 . In this paper we have taken the coupling to be very small, so that we would have treated thisα as being essentially zero. The discussion of this paper concentrated on the case that the higher curvature corrections were present at tree level, so that the causality problem had to be solved by tree level physics. In this appendix, we consider this loop generated term in QED and we will show that the potential causality problem is solved by one loop effects.
In QED, when we get to an impact parameter of order m −1 we cannot be satisfied with the low energy action (2.13). Fortunately the necessary diagrams were computed in [80]. Using their results it is possible to go to the impact parameter representation (doing the Fourier transform) and check that for b > 1/m we get a result that agrees with the the simple Lagrangian (2.13), but for b < 1/m we get a different result which displays no causality problem. In other words, the potential causality problems arising from (2.13) appear at b ∼ 1/ |α| ∼ e/m, but at a larger distance, b ∼ 1/m, the computation should be already modified and we obtain results consistent with causality. 27 We can view this 26 In [41] 27 See [81] and references therein for a related discussion of this problem. modification as arising from the propagation of electron positron pairs along the t-channel.
Notice, that, in addition, when the photon goes through the shock we can have electron positron pair creation. We can view this an another example of tidal excitations. Indeed in QED, the photon has a non-zero probability of becoming an electron positron pair.

Appendix D. Causality and Unitarity for a Signal Model
In this appendix we review the constraints from causality and unitarity in the context of a simple signal model. We imagine a signal propagating along one dimension. We have an initial signal which is a function of time f in (t) and an out-signal f out (t) which, in Fourier space is given by f out (ω) = S(ω)f in (ω) or Causality implies that if f in (t ′ ) = 0 for t ′ < 0, then f out (t) = 0 for t < 0. By unitarity we mean that the L 2 norm of the out-signal should be smaller than that of the in-signal dt|f out (t)| 2 ≤ dt|f in (t)| 2 . Now it is well known that the Fourier transform of a function which vanishes for t < 0 is analytic in the upper half ω plane. This follows directly from the explicit integral expression for the Fourier transform. Then if f in = 0 for t < 0 we find that both f in (ω) and f out (ω) are analytic in the upper half plane. This also implies that S(ω), which is given their ratio, is also analytic. One might worry that S(ω) could have poles at zeros of f in (ω). However, we can change the location of the zeros of f in (ω) by choosing different functions. Therefore S(ω) is analytic in the upper half plane.
We will now prove that unitarity implies that |S(ω)| ≤ 1 in the upper half plane.
In conclusion, we find that S(ω) should be analytic and bounded |S(ω)| ≤ 1 in the upper half plane. These are necessary and sufficient conditions. 28 u v u u in out Fig. 11: We consider a v independent perturbation that is localized in the u direction, given here by the shaded region. We then consider signal propagating along the u direction, which are v dependent and demand causality. We can consider an S matrix that connects the region before the perturbation to the region after the perturbation.
Let us now briefly mention how this is connected to the field theory situation. We consider light cone coordinates u and v. We consider a perturbation that is translation invariant in v but is localized in the u coordinate. We call this "the shock". We expand the fields in the v coordinate at some u in and then we expand them again at some u out after the shock. To make contact with the above discussion we call t = v and p v = −ω.
We can expand the field as We can do this for φ in and φ out in terms of a in and a out . These oscillators then are related by a ω,out = S(ω)a ω,in , a † ω,out = S(ω) * a † ω,in This defines S(ω) for positive ω. For negative ω we can define S(−ω) = S(ω) * . Alternatively, we can define The commutation relations for the in and out oscillators require that |S(ω)| 2 = 1.
In a case where there is particle mixing, but no particle creation, then the fields have indices φ i in and φ j out . Now S is a matrix which obeys S ij (−ω) = S ij (ω) * and S ij (ω)S kj (ω) * = δ ik due to the commutation relations of the oscillators before and after the shock. In the preceding discussion we have neglected the transverse dimensions.
We can now remedy that by including the momentum in the transverse dimensions as part of the indices we are discussing here.
If we consider a signal that is made out of physical particles, one might correctly worry that the fact that ω > 0 will preclude us from localizing the signal in time. In order to avoid this issue we can consider a coherent state of the form with a real function f in . This is a state that could be produced by adding a hermitian term to the Hamiltonian at some early time u in . On this state we have the expectation where the functions are related as in the signal model. 29 Here we assumed a linear relation between the in-and out-signals. Furthermore, we can also consider the expectation values of the normal ordered product T vv = T tt =: ∂ t φ(t)∂ t φ(t) :. When this is evaluated on the state (D.7), and integrated over t we find that the answer is given by Thus, the condition that the total light-cone momentum P v should not increase implies the norm condition ||f out || 2 ≤ ||f in || 2 .
More precisely, we can consider the signal f in exciting a mode involving a graviton with a given polarization. The signal f out is the same mode of the graviton. In addition, the initial graviton could go into other massive particles. Then the condition that the total P v in the out-graviton mode should be no bigger than the initial P v , which was all contained in the graviton mode, leads to the norm condition (or unitarity condition) for the signal model. In conclusion, the graviton-graviton matrix element S gg (ω) obeys all the assumptions of the signal model. Therefore, it should be analytic and |S gg (ω)| ≤ 1 in the upper half plane.
Note that we have assumed here a perfect v-translation symmetry for the perturbation that creates the shock. In our scattering problem, see fig. 3, particles 1 and 3 have small p v momentum. Thus, in this discussion, we have neglected this small momentum. This is reasonable for sb 2 ≫ 1.

Appendix E. Scattering in String Theory
String theory in flat space is the simplest example of a theory that follows into the category of weakly coupled gravitational theories with higher derivative corrections that are subject of our analysis. As explained in the introduction, a motivation for this work was actually to argue that string theory is inevitable, at least, under certain assumptions.
It is well known that effective gravitational actions in string theory contain higher derivative corrections at the string scale [82,83]. In particular graviton three-point amplitudes can contain the higher derivative terms that we constrained in this paper using causality. The potential problem is fixed by extra particles with string scale masses. Here we would like to understand how this is happening in detail.
Let us first recall the form of the three-point gravity amplitudes in bosonic, heterotic and type II string theories [84] A ggg = √ 32πGǫ and we have ǫ bos =ǭ bos = ǫ het = 1 and ǫ II =ǭ II =ǭ het = 0 . Translating it to our notations we get α bos The vanishing of some of these corrections can be understood from supersymmetry, as explained in section 3.2. For the purposes of this paper, the type II case is not interesting since there are not corrections at all for the graviton three-point function.
The high energy scattering problem in string theory was studied in a nice series of papers by Amati, Ciafaloni and Veneziano [49,50,51,52,53], see also e.g. [34,85]. Let us first review their picture. The scattering can be described in terms of a phase shift defined , where [POL] represents a factor that depends on the polarizations and is polynomial in the momenta. We will only need its form in a specific limit. In the high energy limit C(s, t, u) has the celebrated Regge behavior This Regge form is reflecting the creation of particles in the s-channel. The infinite sequence of s-channel poles is becoming a cut, the cut arising when s → se 2πi . The creation of the massive s-channel states is also related to the fact that we get an imaginary part from the (−i) tα ′ 2 factor in (E.4) is saying that the most likely process is to create a massive closed string, rather than scattering the gravitons.
For large s, only small q will contribute and we can approximate the prefactor in (E.4) by 1/t. Then the integral becomes where Y = log(−isα ′ /4). We have two characteristic behaviors, depending on whether b 2 is larger or smaller than α ′ log(sα ′ ). For large b we get the usual 1/b D−4 behavior. For small b we get the result in brackets in (4.8). Note that since the transition region occurs for a b 2 which is larger than α ′ by a log s factor, we can, a posteriori, justify the fact that we have approximated the prefactor in (E.4) by 1/t.
In our field theory discussion we had represented the phase shift as a sum over poles.
We can wonder how this applies to the string theory discussion. Notice that from (E.4) we get a Gaussian integrand factor of the form Let us assume that b = (b, 0, · · · , 0), so that it points along the first coordinates. When we do the integral over the first component of q, call it q 1 , we get a saddle point for (E.6) at Fig. 12: We display the complex q 1 plane. We have displayed the poles in the t-channel by black crosses. The saddle point (E.7) of the Gaussian integral has been denoted by a red cross. We have shifted the original integration contour for q 1 , which was along the real axis, to the complex plane so that it passes through the saddle point (E.7). In the process we have picked up some of the poles in the t-channel.
It is thus convenient to shift the contour to this location, (E.7), where this saddle point contribution gives something of the order of This is not the whole answer, since by shifting the contour to this location, we can pick up some poles from the prefactor (E.4), see fig. 12. We always pick up the pole at t = 0 30 , 30 The location of this pole in the q 1 plane depends on q rest which we take to be real.
which was the center of our gravity discussion, but we can also pick up the poles at t = 4 α ′ n for n < −q 2 s , where q s is given in (E. 7). Notice that at these saddles the t dependent part of the exponent gives us factors of s 2n as we expect for the corresponding spins 31 , once we take into account that [POL] contains a factor of s 4 . When b is large b 2 ≫ α ′ log(sα ′ ), we pick up many poles, but when b is small we only pick up the t = 0 pole, but, even then, the integral is more accurately computed using (E.5). Note that the factor [POL] in string theory phase simplifies at t = 0 and becomes the product of three-point amplitudes we discussed in the body of the paper and that we added as Pol in (4.8). In other words [POL]→Pol as t → 0.
In particular, note that the residues of poles associated to the massive states go as 1 (n!) 2 e −b √ 4n α ′ s 2n . As a function of n, these contributions decrease and then start increasing again with a transition at a value of n corresponding to the saddle point (E.7) 32 .
We can ask: Why don't we include all poles in the t-channel?. If we were to include all poles in the t-channel, we would obtain the wrong answer. The reason it is wrong is that in string theory the argument that we can shift the contour is not correct because of contributions for large values of q. Such large values of q were never meant to be included in the integral, since the kinematics of the process we consider restricts the real values of q 2 to be much smaller than s. In deriving the physical picture, we certainly assumed that q 2 ≪ s. 33 Indeed, if we look at (E.4) we get a very small contribution from large real values of q 2 . On the other hand, if we were to keep s fixed and we formally look at large real values of q 2 in the original expression, (E.3), then we would encounter the u channel poles. The conclusion is that shifting the contour for the q 1 integral, while it can be done formally, it does not represent the real physical computation we want to do.
Approximating the integrand using (E.4), and then integrating gives the physically correct answer.
One can qualitatively say that, for large b 2 /(α ′ log sα ′ ), we get a contribution of some of the t-channel poles, as in fig. 12, and then the rest of the poles are completely resummed via the saddle point integral in fig. 12. Their contribution should be better thought of as coming from the creation of extended objects in the s-channel. 31 For a closed string with N L = N R = n, the maximum spin is J = 2 + 2n. 32 Here we are assuming that both b 2 and log sα ′ are large with a ratio larger than one but fixed, so that we can neglect the 1/(n!) 2 in this discussion. This last factor makes the sum converge for large n, but this convergent answer, as discussed below, is not the correct one. 33 Momentum conservation and on-shell conditions impose q 2 < s 4 .
Another remark we want to make is the following. It was shown in [34] that the plane wave solution is a solution to all orders in the α ′ expansion. This gravitational plane wave encodes the contribution from the t = 0 pole. However, we have seen that sometimes we get a subleading contribution from the other poles, due to massive states along the t-channel. These mean that the physical scattering process contains extra contributions not captured by the plane wave 34 .
In string theory we can also take into account the tidal excitations. In this case the phase shift can be viewed as an operator that maps the two initial gravitons to two final generic string states. [49,50,51,52,53] have shown that this operator has the remarkably simple expressionδ ∝ dσdσ ′ δ grav (X L (σ) −X R (σ ′ )) where X L and X R are the transverse space positions of the string on the worldsheet and δ grav (b) is the ordinary gravity phase shift. This is valid for distances b 2 ≫ α ′ log(sα ′ ). The effects of these tidal excitations do not help in resolving the causality issues discussed here and are unrelated to the appearance of closed strings in the s-channel discussed above. See [49,50,51,52,53] for further discussion.

Appendix F. Properties of the AdS Shock Wave
Let us first examine the problem of higher derivative corrections for the AdS shock wave. As in the case of flat space the shock wave at hand continue to be an exact solution when arbitrary higher derivative corrections are included.
The argument for this is identical to the one in section 5 of [59]. The vector l µ = ∂ µ u = {1, 0, 0, 0..} in coordinates u, v, y i , z. We can now compute the vector V µ that the argument talks about. We find V µ = {0, · · · , 0, −2/z}. In other words V z = −2/z and the rest of the components are zero. This obeys that V µ l µ = 0 are required in their argument.
In addition, one can also compute the substracted Riemann tensoř It is indeed of the form stated in [59] with the symmetric tensor K given by 34 This is not in contradiction with [34], since these extra terms can be viewed as a nonperturbative contribution in α ′ .
with the rest of the components equal to zero. This K obeys that K µν l ν = 0 as required by [59]. Notice that this form of K also leads to the equation of the form (5.3), when the Riemann tensor is used in Einstein's equations, namely K µ µ = 0. Once the former equation is imposedŘ reduces to the Weyl tensor.
Using this shock wave we can once again compute the time delay for different theories.
It is convenient to evaluate Riemann tensor in the coordinates that make rotation symmetry manifest. The result iŝ where i = 1, .., D − 2 so that we span y and z components. In the bulk of the paper we consider propagation of different perturbations in the shock wave background described above. We first write the most general form of the second order equations of motion and then compute the time delay in the high energy limit.
Let us consider several limits of the shock wave to make contact with previous investigations of similar type. First, we expect to recover the flat space shock wave for probes that come close enough to the center of the shock or, equivalently, for ρ → 0. Indeed, it is easy to check that this asymptotic is correctly recovered (5.9).
Let us consider couple of other limits. We can fix y 0 and send z 0 → 0 in which case we have a source at the boundary and the shock wave takes the form which is exactly the shock wave considered in the energy correlator problem [14]. In the opposite limit z 0 → ∞ we get h ∼ z D−3 , (F.4) and the time delay in this background was computed, for example, in [60].

Appendix G. Time Advances and Time Machines
In this appendix we would like to argue that a negative time delay enables us to build a time machine which leads to closed time-like curves. This is a standard argument [86] and the only thing we check is that the long range gravitational forces do not prevent us from setting it up. The setup we would like to consider is the following. We have two shock waves that correspond to energetic particles with momenta q 1,u = √ s 2 and q 2,v = √ s 2 separated by distance r in the transverse plane. The first shock is localized around u = 0 and the second one around v = 0. We would like the separation to be such that r ≫ r S , namely we are in the regime where the black hole formation does not occur Next we would like to consider a test particle that propagates through both shocks in such a way that it ends up at the same position where it started, thus, forming a closed time-like curve. Fig. 13: a) We imagine a background that consists of two shock waves located at u = 0 and v = 0 widely separated in the transverse directions which is not presented on the figure.The arrows show the motion of the probe massless particle projected on to the u, v plane. b) Same motion but projected on the transverse plane. The two background shocks are separated by r and the probe passes at a short distance b from each of them. The vertical region of the path in (a) corresponds to the horizontal motion in (b). We can build a closed time-like curve as depicted on the picture by crossing this pair of shocks if time advances are allowed. We need mirrors to reverse the motion in the transverse plane as we pass through the shocks.
The situation is depicted on fig. 13. When the test particle crosses each of the shocks it gets shifted by ∆v = ∆u ∼ G √ s b D−4 . Between the shock the particle travels the distance of order r. Thus, we want the time shift to be G We also want b ≫ r S where r S is the Schwarzschild radius for the shock wave-test particle pair, r D−3 S ∼ G p √ s, where p is the energy of the probe particle. Together with (G.2) it implies √ s ≫ p where p is the energy of the probe particle. We also need that pb ≫ 1. The conclusion is that in D > 4 we can construct closed time-like curve using negative time delays by choosing b,r and s appropriately. For example, if we have a causality problem that appears at a scale b 2 ∼ α 2 , then we take this value for b. Since we are at weak coupling, we know that the Planck length l p ≪ b. We can then pick √ sl p ∼ X 1+a , pl p ∼ X 1−a ′ , with X = (b/l p ) D−3 and we can choose a > 0, a − a ′ < 0, 1 − a ′ + 1/(D − 3) > 0 to ensure that r S ≫ b, r S ≪ b and pb ≫ 1. We can achieve this with a ′ = 1/2 and a = 1/4, for example.

Appendix H. Representations That Couple to Two Gravitons
Here we would like to understand better what are the representations of little group SO(D − 1) of massive particles that can couple to two gravitons. We are interested only in bosonic fields since a single fermion does not couple to two gravitons.
Let us consider the decay of a massive particle in its rest frame so that p 2 = (M, 0).
Gravitons produced have p 1 = − p 3 = p. We characterize the original particle by some polarization tensor e i 1 ...i k which has only spatial components and is traceless with respect to any pair of indices. We do not specify the symmetry properties of this tensor yet.
We characterize gravitons by polarization tensors e 1 and e 3 such that e 1 . p = e 3 . p = 0.
We have three type of contractions (see also [77]) (H.1) The first amplitude A 1 exists only for particles in the symmetric traceless representations (Young tableau that consists of single horizontal row with k boxes). Actually all three amplitudes are are allowed for symmetric representation and we discussed them in the bulk of the paper in detail.
For the second and third amplitude we can add more rows to the Young diagram.
Properties of these, so-called, mixed-symmetry tensors are nicely reviewed, for example, in appendix E of [87]. By thinking about the representation in terms of tensors which are manifestly anti-symmetric with respect to indices in a given column we can read off possible representations. In particular the fact that we have only three different vectors means that we can have at most three rows.
Let us write all the amplitudes in a covariant manner. The general prescription is the following e i 1 e j 3 → E µν 13 ≡ ǫ µ 1 p ν 3 (ǫ 3 .p 1 ) + ǫ µ We then have for the amplitudes and contract both sides of the amplitude, where Π is a projector on to the space orthogonal to the intermediate momentum.
As the next step we would like to focus on those diagrams that produce s a with a ≥ 2 for the amplitude. Everything that is less is irrelevant for causality violation. In the language of Young tableaux it corresponds to having a ≥ 2 boxes in the first row. It will be curious to understand if mixed symmetry fields with a = 2 can resolve the causality problem we observed in the bulk of the paper. As an example of such field would be (2, 2) (a square with four boxes) or (2, 2, 2) fields (a vertical rectangle of 2 × 3 boxes). There is a short list of representations of this kind.
One could wonder whether these particles alone (without the infinite tower of higher spin particles) could solve the causality problem. We think that the answer is no. We leave a full exploration of this question for the future.