General Relativity from Causality

We study large families of theories of interacting spin 2 particles from the point of view of causality. Although it is often stated that there is a unique Lorentz invariant effective theory of massless spin 2, namely general relativity, other theories that utilize higher derivative interactions do in fact exist. These theories are distinct from general relativity, as they permit any number of species of spin 2 particles, are described by a much larger set of parameters, and are not constrained to satisfy the equivalence principle. We consider the leading spin 2 couplings to scalars, fermions, and vectors, and systematically study signal propagation in all these other families of theories. We find that most interactions directly lead to superluminal propagation of either a spin 2 particle or a matter particle, and interactions that are subluminal generate other interactions that are superluminal. Hence, such theories of interacting multiple spin 2 species have superluminality, and by extension, acausality. This is radically different to the special case of general relativity with a single species of minimally coupled spin 2, which leads to subluminal propagation from sources satisfying the null energy condition. This pathology persists even if the spin 2 field is massive. We compare these findings to the analogous case of spin 1 theories, where higher derivative interactions can be causal. This makes the spin 2 case very special, and suggests that multiple species of spin 2 is forbidden, leading us to general relativity as essentially the unique internally consistent effective theory of spin 2.


Introduction
General relativity is beautifully consistent with all current observations over a fantastic range of scales. Although it is not UV complete, it does represent a consistent effective theory for energy scales well below the Planck scale. Here we would like to examine its theoretical underpinning. We are interested in whether there are universal principles upon which general relativity is built. Often one appeals to the equivalence principle, or the principle of minimal coupling, or the diffeomorphism symmetry or general co-ordinate invariance, or the idea of space-time curvature, or beauty, etc, to motivate the theory. However such principles do not have universal applicability. For instance, the equivalence principle evidently does not apply to other sectors of physics, such as electromagnetism or the strong force, which acts on different particles differently. Minimal coupling is not universal in that particles such as neutrinos are not minimally coupled to photons, and the same is anticipated of hidden sector particles.
Nevertheless there are principles that appear universal, at least for energies below the Planck scale, namely Lorentz symmetry and quantum mechanics. It is known that these basic principles underpin the structure of the Standard Model of particle physics. There, the idea is simply to build a theory of a particular collection of particles of different spins and masses and focus on just the leading operators that survive at low energies, allowing some particles to be minimally coupled. In the case of gravitation, one might enquire as to whether only the principles of Lorentz symmetry and quantum mechanics are enough to uniquely specify the theory as being general relativity. Of course in this case the theory is non-renormalizable in 3+1 dimensions, but we only demand that we have a consistent effective theory. In fact it is often said that indeed general relativity is the unique theory of massless spin 2 particles at low energies. However, as was discussed in a recent paper [1] by one of us, and introduced in older work [2], there are whole classes of other Lorentz invariant theories of massless spin 2 particles. Here one couples the linearized Riemann tensor directly to matter, as it automatically results in a gauge invariant interaction. In that recent paper [1] the special case of coupling to photons was examined, where it was shown that there is superluminality of one of the polarizations of the photon.
In this paper we would like to perform a much more systematic analysis of all these other classes of spin 2 particles. We first demonstrate that by utilizing higher derivative interactions, one can build theories of an arbitrary number of species of spin 2, which couple with a large number of parameters. This makes the space of theories vastly greater than the special case of general relativity, which involves only a single minimally coupled spin 2 particle. We systematically lay out the various types of leading order interactions to matter, including coupling to scalars, fermions, and vectors. In each case we carefully examine whether there is some form of superluminality in either the matter particles or a spin 2 particle. Our basic findings are: (i) most interactions directly lead to superluminal propagation of either a spin 2 particle or a matter particle, (ii) interactions that are subluminal generate other interactions that are superluminal, and (iii) interactions that are subluminal and do not generate other interactions that are superluminal, do not represent true interacting spin 2 particles at all, but only self interactions in the matter sector. A summary of the theories considered and our findings is given in Table 1.
We contrast these results with the case of spin 1 particles, where the most leading order higher derivative couplings are perfectly causal. So the properties of spin 2 are quite distinct from spin 1. We argue that the size of superluminality in all these spin 2 theories can be made sufficiently large to lead to macroscopic time advance and problems with causality. We also point out that this cannot be readily fixed by the introduction of other operators. We contrast this to the case of general relativity, which leads only to subluminal propagation for matter in the presence of ordinary sources that satisfy the null energy condition. We also remark on similar problems that emerge in massive spin 2 theories. Finally we comment on self-interactions and operators of much higher dimension that can be causal, although they tend to be associated with the presence of ghosts.
Our paper is organized as follows: In Section 2 we give the basic structure of theories of spin 2. In Section 3 we study interactions involving a single graviton in couplings to matter. In Section 4 we show that these theories have superluminality. In Section 5 we study interactions involving multiple gravitons in couplings to matter. In Section 6 we present a more general proof that these theories have superluminality. In Section 7 we compare these findings of superluminality to general relativity, spin 1, and massive spin 2. In Section 8 we conclude.

Space of Spin Theories
In this work, we primarily focus on massless spin 2 particles; though we will discuss both spin 1 and massive spin 2 in Section 7. Massless spin 2 particles must be comprised of 2 helicities in 3+1 dimensions to be associated with a unitary representation of the Lorentz group. We embed the particles into a tensor field h µν,I , where µ, ν are Lorentz indices, and I is a species index, running from I = 1, 2, . . . , N for N species of spin 2 particles. This is a convenient way of describing local interactions, though it comes at the expense of introducing too many degrees of freedom. As is well known, this can only be fixed by the introduction of an identification into the theory.

Multiple Species
At the linear level, we can introduce the following identification h µν,I ≡ h µν,I + ∂ µ α ν,I + ∂ ν α µ,I (2.1) where α µ,I is a set of 4 arbitrary gauge functions for each species I. Upon gauge fixing, this means that h µν,I is not a good Lorentz tensor, and so utilizing h µν,I directly to build an interacting theory in a self-consistent manner is generically extremely difficult. We shall return to this shortly, but for now it is important to mention a possible way around this issue. By taking two derivatives of which is proportional to the linearized Riemann tensor.
The Lorentz invariant theory of free spin 2 particles is of course a unique theory. It simply describes particles with helicity ±2 satisfying the dispersion relation E = p. At the level of a local Lagrangian it is therefore also unique (up to boundary terms and field re-definitions) and takes the form Note that this is gauge invariant, up to boundary terms.
By exploiting the linearized Riemann tensor, interactions can be readily written down. If we consider an interaction involving only one spin 2 particle coupled to matter, it takes the form whereT µνρσ I is any 4-index tensor built out of the matter degrees of freedom; ideally one that is constructed to be ghost free. Note that by integrating by parts, one can also express this as h µν,I coupled to an identically conserved tensor, but this formulation with Riemann makes the construction of gauge invariant operators more explicit. For example, this can be readily generalized to multiple gravitons in interactions by allowing for multiple insertions of the linearized Riemann tensor, which we shall return to in Section 5 (other work includes Refs. [3,4]).
This allows for a huge set of theories, since there is tremendous freedom in the choice ofT I ; in fact this can be an arbitrary function, containing much more freedom than the usual coupling constant that specifies how a graviton couples to the matter Lagrangian. We shall systematically study couplings to scalars, fermions, and vectors in Section 3. A concrete example is the operator when coupling to vectors (the dots indicate other terms needed for consistency, that will be discussed in Section 3.4). Here the i index runs over vector species and the I index runs over spin 2 species, so that there is a matrix of couplings c iI . Hence this class of theories is associated with a huge range of parameters associated with an arbitrary number of species of spin 2. We have depicted this huge space of theories schematically as the blue region in Figure 1. Note that nothing in this framework demands the universality of couplings principle, so we allow the couplings to be arbitrary. This poses a severe challenge to derive the equivalence principle, which we aim to tackle in this work.

Single Species
The above analysis applies to any number N , including the special case of a single species N = 1.
However in this special case, a more relevant interaction is allowed by the Lorentz symmetry. All of the above interactions involve higher derivatives compared to so-called minimal coupling, where the h µν,I is directly used. This is a well studied subject, which we recap only briefly here.
To utilize minimal coupling, we would attempt to couple the spin 2 field directly to matter as follows This evidently violates the linear gauge invariance, unless T µν I is conserved. However, there is no Figure 1: Space of possible theories of spin 2 particles. If only a single species of spin 2 is included with minimal coupling (requiring the nonlinear diffeomorphism invariance in its description), then we are uniquely led to general relativity (GR) at low energies; indicated by the small red circle on the left. If multiple species of spin 2 are included (requiring the linear gauge invariance in its description), which must be non-minimally coupled, then we are led to a vast array of possibilities associated with many parameters; indicated by the large blue circle on the right. In this paper we show, however, that this much larger class of theories tend to suffer from problems of acausality, and thus are ultimately excluded from a physical viewpoint.
non-trivially conserved tensor. The closest object is the matter energy-momentum tensor, which is only conserved in the limit in which we ignore this new interaction.
Nevertheless we can proceed order by order in powers of the coupling κ. At second order in the coupling it is found that there is no consistent way to build the interactions without introducing an update to the gauge transformation rule. In fact, at all orders the gauge transformation is necessarily lifted to the full nonlinear diffeomorphism invariance Since there are only 4 coordinates, this only acts to remove 4 degrees of freedom, and hence this only works for a single species of spin 2. This conclusion is compatible with the traditional nogo theorems dictating the inconsistency of multiple interacting spin 2 fields [5,6] where minimal, or leading order, couplings are assumed. In contrast, in the previous subsection this conclusion is avoided because the linearized Riemann tensor is used, which is explicitly invariant under the gauge identification.
Furthermore, the entire action is determined to all orders uniquely, up to boundary terms and field redefinitions, in terms of the single coupling G N = κ 2 /(16π), giving rise to the Einstein-Hilbert where the script R is the full non-linear Ricci scalar (and we reserve the straight "R" to refer to the canonically normalized linearized piece only as in eq. (2.2)). This essentially unique theory is summarized by the small red circle in Figure 1. We do note that we can add to this action higher dimension operators of the form we summarized in the previous subsection, by exploiting the (nonlinear) Riemann tensor, and so other sub-leading interactions are allowed. However, since it necessarily only involves a single spin 2 species, it is much more restricted and associated with far fewer parameters than the other set of theories described above.
Famously, this leads to the 1/r 2 force law, the universality of free-fall, and all the successes of general relativity. At the same time this theory carries various conceptual puzzles, such as the notorious cosmological constant problem and the black hole information paradox. In this paper we would like to provide a deeper explanation as to why nature has nevertheless chosen this special theory, rather than the much bigger space of theories depicted in Figure 1.

One Spin 2 in the Interaction
Our focus in this and the next few sections is to develop and examine the large class of theories of Section 2.1 which may involve any number of spin 2 particles.
We first investigate interactions with matter that only involve one spin 2 particle in the interaction vertex. We consider explicit interactions with scalars, fermions, and vectors in turn. The boson interactions will lead to novel, nontrivial theories that prima facie appear perfectly healthy.
However, in section 4 we will show that all of these lead to superluminal propagation of matter, and ultimately acausality. There appears to be at least one fermion interaction that evades this problem, but in Section 5.3 we will explain why it too is associated with a form of superluminality.
The Feynman diagrams for the interactions considered here are depicted in Fig. 2. Since in this section we shall only make reference to a single spin 2 particle in the interaction, we shall suppress the species index I, though the extension to a sum over multiple species is straightforward.

One Scalar
The most general interaction Lagrangian linear in the spin 2 field and involving no more than one derivative on the scalar is In general the coefficients a i can be functions of the scalar, but to lowest order we can take them to be constant. The first term will not affect the causality properties of the scalar, since it acts as a potential term and is unimportant in the large momentum limit. The other two terms involve derivatives of the scalar, and so can potentially lead to alterations of the causal structure of the theory. However, these interactions are fictitious 1 . This can be seen by looking at the scattering amplitude for in the gauge ∂ µ h µν = h µ µ = 0, in which case G µν = 2h µν . In this gauge, the interaction vertex is proportional to p 2 (p) ij k i 1 k j 2 , and vanishes when the external spin 2 field is placed on shell, where p 2 = 0. Similarly, any scattering built out of this vertex vanishes when one of the spin 2 fields is on shell. This is a direct consequence of the fact that this interaction is proportional to the free field equations of motion, which are used to define the Fock space in the interaction picture.
Consequently, this term can be completely removed by a field redefinition, and so is a 'redundant operator'. Notice that this removal is exact for our theories because the Einstein tensor is exactly linear in the spin 2 field. This is in contrast to general relativity, where the Einstein tensor contains higher order terms, which lead to couplings that are not removed by the redefinition provided.
So although terms like ∼ R f (φ) are perfectly causal, they do not actually involve the spin 2 particle at all. Other terms like R(∂φ) 2 are ambiguous in that their (a)causality depends on other sectors of the theory, but they also do not actually involve the spin 2 particle. This will be an important observation for all of the theories we discuss in this work, and so we show explicitly how a field redefinition transforms these interactions away in full generality in the Appendix. 1 Also, note that the theory contains Ostrogradski ghosts unless a2 = 0 [8].
In order for the interaction to be nontrivial, then, it must involve the (linearized) Riemann tensor. It is impossible to contract Riemann with a quantity involving first derivatives of a single scalar field, but if we consider terms involving second derivatives we can write The term written explicitly has fourth (time) derivative equations of motion and so will contain Ostrogradski ghosts, but if we add terms proportional to the Einstein tensor to the Lagrangian with the right coefficients the equations can be made second order [8]. Alternatively, we may use the second Bianchi identity to write the divergence of the Riemann tensor appearing in the equations of motion in terms of the Einstein tensor, ∂ γ R αβγδ = ∂ α R βδ − ∂ β R αδ , which will vanish on flat backgrounds. The equation of motion for the scalar field becomes Here we have only shown the terms with the most derivatives acting on the scalar field, which amounts to treating the Riemann field as slowly varying compared to the scalar. In section 4 we show that this leads to superluminality on nontrivial backgrounds.

Two Scalars
We can also construct an interaction with the Riemann tensor using only first derivatives if we are willing to introduce a second scalar field. The interaction is: As with the interaction (3.3) before, this term does not introduce ghosts in the scalar sector on flat backgrounds, and the ghosts can be removed completely if terms proportional to the Einstein tensor are added [9].
The scalar equations of motion for this theory are (again, keeping only the leading derivative Here, we use the notation t µν To show that this system exhibits superluminal motion, it suffices to only take one of these backgrounds to be nontrivial. In this instance, the kinetic matrix diagonalizes and one of the fields is dictated by the wave equation.
This background is investigated in section 4.

Fermions
We now ask whether it is possible to couple fermions to the Riemann tensor, or if these types of couplings lead to superluminality as well. At cubic order in the fields the most general form of the interactions can be written as Here we have neglected parity violating terms involving γ 5 , but these do not alter our conclusions.
The first line involves no derivatives on the fermion, and so will not alter the causality structure of the theory. However, both of these terms are trivial. The first one is proportional to the Einstein tensor, so, as before, can be removed with a field redefinition. The second potentially benign term involves the Riemann tensor and so is a genuine interaction, and resembles a mass term for the fermion on a given spin 2 background. However, this term can be shown to vanish. This is because the two sigma matrices can be replaced with their symmetrization, as a consequence of the index structure of the Riemann tensor. Then the following gamma matrix identity can be employed: When contracted with the Riemann tensor, this becomes The first term vanishes because it is a symmetric tensor contracted with an antisymmetric tensor, and the last vanishes by the Bianchi identity.
The second line of (3.6) involves couplings containing derivatives acting on the spinor field.
The first two terms are proportional to the Einstein tensor, and were shown in [10] to arise from integrating out heavy gauge bosons in the standard model. There, it was shown that these can lead to superluminality in particular backgrounds, but these will not contribute in the absence of a source for the spin 2 field. The last term can also be reduced to this form as well by employing spinor identities: Thus, even at the single derivative level, all cubic fermion couplings can be removed with a field redefinition. If we are willing to consider higher powers of the fields, however, it is possible to write down nontrivial interacting terms. The first of these is a dimension 9 operator: While this term is benign from the standpoint of causality, through loop effects it will generate dangerous terms that will be considered in section 5.3.

Vector
We now turn to interactions between a spin 1 and spin 2 field. Of the possible cubic interactions, there most general (parity-even) Lagrangian is This was first discussed in the context of the low energy effective theory for quantum electrodynamics in curved spacetime in [11]. (For a thorough discussion see [12]). As before, the first two terms do not represent true interactions, and they vanish on backgrounds that are Ricci flat, so we focus on the last. The equation of motion for the spin 1 field is (again we have only shown the terms that dominate when the background field is slowly varying).
The characteristics for the spin 1 field set by this last equation will be analyzed in the next section, and lead to superluminality.

Signal Propagation
The equations of motion for the three interacting theories we have uncovered so far (3.2), (3.4), and (3.12) all have the same basic form. We will now show that all these admit superluminal propagation while remaining in the regime of validity of the effective field theory. Using the eikonal (geometric optics) approximation to the equations of motion, where the wavelength of the matter particle is much smaller than the scale at which the background varies, we get that the characteristics obey The only difference between the three theories is the form of the tensor χ αβ , which for the various theories we consider is In the first case the tensor is the second derivative of a background field, but if this is a plane wave it becomes χ µν = −a 5 p µ p ν φ p /Λ 6 . The p µ vectors are lightlike if the scalar is massless, and timelike if the scalar is massive. The same holds for the second case, with φ p → χ 2 p . In the last case, the polarization vectors are spacelike. Generically, then, we make the replacement χ νβ → qp ν p β .

Superluminality
We now show that theories of this type exhibit superluminality on certain backgrounds. Let us take the Riemann tensor to be a plane wave, so that it is a solution to the free field equations of Where subscripts denote projections, e.g. h kp = h µν k µ p ν , and subscripts after commas denote derivatives.
To be explicit, let's now specialize to a particular setup: we will take γ to propagate in theẑ direction, and will take k and p to be perpendicular to this direction, say p = |p|x. In the case where p µ is null and future directed, then p 0 = |p|, and so our characteristic equation reduces to This will define a dispersion relation ω(k), which will set the speed of propagation of the system through ω = vk. For simplicity we specialize further to keeping only the cross polarization of the spin two field, h × = h xy , arriving at We have introduced the quantity κ = q|p| 2γ × and shifted the frequency variable byω = ω − κk y /2, which accounts for the fact that all signals drift uniformly in the y direction. From here the speed can be read off by looking at the eigenvalues of this matrix. The trace and determinant condition give One of these will necessarily be greater than 1 unless κ = 0, since the determinant is unchanged.
If the vector p µ is spacelike (relevant to the spin 1 case) the analysis is even easier: since we took it to be perpendicular to the direction of propagation of the spin 2 field, the derivatives in equation 4.3 must be contracted with k µ . Then we have so the speed can be either greater than or less than the speed of light, depending on the sign ofḧ + . Now, to make the setup more explicit, we imagine we have a source for the spin 2 field, generating lots of fairly low energy particles. The ultimate strength of this effect will depend on the intensity of the wave, which can be made arbitrarily high, and the wavelength, which can be made arbitrarily wide. We wish to send a particle through this wave perpendicularly, all with the entire setup in a background χ field of very long wavelength. With this setup the magnitude of the effect can be made as large as desired. Then, in order to create a paradox, we take several of these systems in the setup of [13], all boosted relative to each other, and are capable of sending signals backwards in time to a specified location.

Regime of Validity
where we have taken the spin two laser to have a pure frequency ω. When we propagate a short wavelength test particle through this background for a distance L, a net buildup of time advance can be attained ∆L = ∆vL. The bare minimum required for this to be useful for constructing a time machine is that the shift in distance be greater than the wavelength of the photon used, ∆L >ω −1 [14,15]. At first this can seem to be made arbitrarily large, since the distance L could in principle be taken to infinity. However, we are stymied by this approach due to the fact that the wave pulse we have chosen oscillates around zero, giving the effect of a time advance half the time and a time delay the other half. If a photon is sent through a number of cycles the effect averages to zero, and so the effect must be confined within a single wavelength, L < ω −1 .
Therefore, the ratio of time advance to wavelength of the photon will be ∆L ω −1 ∼ω The last similarity defines the quantity h UV = Λ 3 /ω 2 , which is the maximum value of h for which we can trust the theory. Beyond this, higher order terms in the Lagrangian of the form R p F 2 /Λ 3p , which will be induced radiatively by loops, all become equally important, and the effective description breaks down. Note that since this is the quantity that sets the change in speed, the speed itself must be very close to 1, but the total effect can be made large. Then, the first two factors must be below 1 for our theory, but the hierarchy of photon frequency to spin 2 frequency,ω/ω can be made as large as desired.
It is interesting to see how this effect is shielded in the case where the spin 2 field is the graviton.
In general relativity, the Lagrangian contains the additional tower of operators h k F 2 /M k p set by the Planck mass. If the graviton becomes larger than M p , the effective description breaks down, corresponding in the full theory to the creation of a black hole. Effectively, then, the upper limit on h becomes h UV = min{Λ 3 /ω 2 , M p }, and cannot be made arbitrarily large for small ω. Similarly, there is a tower of operators in the photon sector that scale parametrically as α k F 2k /m 4k−4 e and 2 k F 2 /m 2k e , where m e is the electron mass. Frequenciesω above the electron mass are outside the regime of the effective theory, and in the full theory correspond to the ability for a high energy photon to produce an electron-positron pair through the Schwinger mechanism. In this case the cutoff scale Λ we have been discussing is related to these two scales through Λ 3 ∼ m 2 e M p , and so the maximal ratio is ∆L From the arguments above this can never be larger than one while remaining in the regime of validity of the effective theory, and so any time advance must be much shorter than the wavelength of the photon used. This is consistent with the conclusions of [14,15], who find that it is impossible to construct a time machine using the quantum effects of electrodynamics in curved space.
In this case, it is the fact that this nonminimal term is generated from extra degrees of freedom running through loops that shields the theory from superluminality. There is no limit in which the theories of the previous section can be recovered from this theory, since the limit M p → ∞, Λ fixed entails that m e → 0. Here, the curative degrees of freedom (electrons) are explicitly part of the low energy theory, and cannot be removed from the analysis. This is just the simplest example of how extra degrees of freedom may enter the theory to cure this pathology. Note, however, that it requires the addition of the minimal coupling of general relativity, rendering it disconnected from the theories we were discussing before. Those theories, as they have been written, cannot be cured without this standard coupling included to bound the amplitude of the spin 2 wave.

Two Spin 2 in the Interaction
We now study the interactions involving several spin 2 particles, focussing on the case of two spin 2 particles at each vertex, and show that these lead to acausality as well. The Feynman diagram for the types of interactions we are considering is depicted in Fig. 3 below.

Gauss-Bonnet Coupling
We can also ask whether the Gauss-Bonnet term leads to superluminal propagation if we couple it to matter, along the lines of [16][17][18] (we note that without coupling to matter, the Gauss-Bonnet term is only a boundary term). The matter sector remains luminal if we only use a potential function (not involving derivatives), which is also the only coupling possible without introducing ghosts (in four dimensions) [8]. Then the theory we choose is Though we have explicitly shown that this interaction can involve multiple spin 2 fields to highlight the disconnectedness of this theory from general relativity, we now specialize to the case of a single spin 2 field for simplicity. The equations of motion are These equations can be simplified considerably if we use an appropriate generalization of the de Withh αβ = h αβ − K µν h µν K αβ /2. If we denote k || and k ⊥ as the components of momentum parallel and perpendicular to ∇f , the dispersion relations are then set by The cross term can be eliminated by using a shifted frequency variable,ω = ω +ḟ /(1 +f )k || , and the dispersion relations yield the speeds of propagation For instance, if we take f to be a function of x + t, thenf = f =ḟ , and the sound speed in both directions becomes, to linear order, v 2 = 1 −f . Since there is no preferred sign for the second derivative of this function, the speed can be either faster or slower than luminal.

Gravitational Chern-Simons Term
One may also wonder about a coupling between a scalar and the gravitational Chern-Simons term of the form considered in [19,20]: This theory has a similar status to the Gauss-Bonnet term, since it only alters the propagation of the graviton, and is a total derivative in the limit that the function g becomes constant. The equations of motion for this theory are On a generic background, the function g(φ, iψγ 5 ψ) can depend on both time and space, and the equations of motion will not factorize to separate equations for each helicity. This makes the analysis complicated, but if we make the assumption that g is a small perturbation on an otherwise flat and it remains to evaluate the trace of the matrix squared. In our setting, this can be done by taking a second variation of the equations of motion, which yields Here we have ignored higher powers of the momentum, as on backgrounds with spatial dependence these will only introduce spurious poles with a mass proportional to m ∼ 1/g , outside the regime of validity of our theory. This matrix must be projected onto physical degrees of freedom before the trace is taken, for which we use the flat space projector P ijkl =δ ikδjl +δ ilδjk −δ ijδkl , witĥ δ ij = δ ij −k ikj . At this point the computation can be performed, yielding This generalizes earlier work of Ref. [21] to spatially dependent backgrounds. The dispersion relations set by solutions to this equation will still retain their complex sound speeds, but we remain agnostic as to whether this in itself is a problem. The pathology we are more interested in is whether there are backgrounds for which the real part of the propagation speed is greater than one, signalling superluminal motion. We solve this quartic equation for ω(k) to second order to where we have defined the angle-dependent quantities β 1 =g, β 2 =k · ∇ġ, β 3 =k · ∇k · ∇g.
To explicitly show that this dispersion relation leads to superluminal motion, we specialize to a setup where we have a long wavelength mode for g propagating in the z direction. Then the speed becomes which is indeed greater than 1 for angles between 0 < cos θ < 1/3. Thus on spatially dependent backgrounds the gravitational Chern-Simons theory can give rise to superluminal propagation.

Quantum Corrections for Fermions
We now discuss that the seemingly healthy term in equation 3.10 generates the Chern-Simons term through loop effects. The loop under discussion is depicted in Fig. 4. This term involves four Figure 4: While the theory given by equation 3.10 is not itself unhealthy, pathological terms are generated through the above two loop diagram, leading it to induce theories of the type studied in section 5.2.
fermions and a single spin 2 field and was a dimension 9 operator. It will lead to the Chern-Simons operator involving two fermions and two spin 2 fields, which is also dimension 9, and so can potentially play the leading role for the dynamics of the theory. It is generated through two loop diagrams, but we refrain from going into too much detail about the calculation, as it is quite beside the main topic of this paper. Suffice to say, the effect can be encapsulated with the following terms in the effective Lagrangian: where I 2 and I 4 are quadratically and quartically divergent integrals, respectively, m is the mass of the fermion running through the loops, and J 1 and J 2 involve the index structure of the σ matrices coming from the two possible Wick contractions of the two loop diagram. Explicitly, we have J αβγδµνρσ 1 = σ αβ σ µν σ γδ σ ρσ and J αβγδµνρσ 2 = σ αβ σ ρσ tr(σ γδ σ µν ). Where these multiply the quartic divergence, the sigma matrices are accompanied by another pair of gamma matrices contracted among themselves, inserted at appropriate locations.
The fact that only the Chern-Simons combination is generated, as opposed to the Gauss-Bonnet of the same dimension, can be seen because only this term retains the symmetry of the original The actual coefficient of the Chern-Simons term is ambiguous in the effective field theory framework since it is proportional to a power-law divergent integral, but in the absence of any motivation otherwise we take the ultimate coefficient to be the same scale as the original term in the Lagrangian, Λ −5 . In this case the original term will not affect the causality properties of the theory, but the term it generates through loops will induce superluminality.

General Analysis
The interactions we have presented to this point are by no means an exhaustive list. There is no limit to the number of operators one may write down that represent a nonminimal coupling of a spin 2 field to other matter representations. For instance, one may contemplate all possible quartic interactions between two spin 2 fields and two spin 1 fields. These will all have ghosts in four dimensions, as their equations of motion will be third order. If one is willing to consider six dimensions, however, a combination employing the Levi-Civita symbols can be written, which is ghost free. Similarly, in [22] cubic self interactions of spin 2 particles were analyzed, such as ∆L ∝ R µνσδ R σδργ R µν ργ + . . ., where it was shown that it generically gives rise to superluminality. Faced with this never ending panoply of possible couplings, we are motivated to provide a more general analysis that lays out the conditions that will lead to superluminality.

Conditions for Superluminality
We now formulate a statement that a generic coupling of matter to the Riemann tensor induces superluminality on some background. This is necessary, as otherwise there are in principle an infinite number of couplings that could be chosen, and each would potentially represent a healthy theory. This proof eliminates that worry.
In all cases we consider, the equations of motion for the test particle in the slowly varying spin 2 background reduces to This form is essential for the consistency of a theory. While generic couplings would yield equations of motion with higher powers of momenta, these would all have ghostlike roots. If the mass of the ghost is field dependent it can be made arbitrarily small for some backgrounds, and the theory would have a vanishing regime of validity. Alternatively, if the mass of the ghost is heavy we ignore those branches as artefacts of the effective field theory description, and focus on the quadraticized version of the equations of motion describing the healthy modes. 2 Granted that the theory we consider prepares this dispersion relation, our requirement that it be subluminal reduces to Re{v} < 1, where ω 2 = v 2 |k| 2 and the speed may depend on the particular direction of travel. Note that in these theories we need not make the distinction between group velocity, phase velocity, and signal velocity, since the dispersion relation is exactly quadratic. The speed can be found from this equation as or, in terms of the shifted frequencyω = ω + B 0i k i /(1 + B 00 ), which gives the propagation speed as If we take the matrix B to be a perturbation we can ignore the last term, as it is higher order.
Then the sound speed attains a maximum at where b min is the smallest eigenvalue of B ij . Requiring that this be no greater than one implies that B 00 + b min ≥ 0. This is equivalent to a null energy condition for the matrix B µν .
Including the time-space components only increases the speed, so that if this equation is not satisfied for the regime it can be neglected, then it will only get worse for the regime where it cannot.
In fact, this stipulation can be avoided by noting that the null energy condition is independent of Lorentz frame, so that if we can find a frame in which the time-space components vanish, the equivalence is exact. However, the conditions for such a frame to exist are precisely the null energy condition, leading us to the general statement that But from here it becomes easy to see that theories of the type we consider will always lead to superluminalities, because they are linear in the spin 2 field. So, if we find one wave solution that has subluminal propagation, the solution with R µναβ → −R µναβ will have superluminal propagation.
Since there is nothing in the theory that sets a preferred sign for the spin 2 field, generically faster than light travel will exist in the matter sector. This holds for arbitrary Lagrangians of the form ∆L = R µναβ T µναβ , where T contains derivative terms. However, one may also be concerned with interactions involving two powers of the Riemann curvature, and so on. These will lead to superluminalities for the exact same reason if we consider backgrounds with a superposition of multiple spin 2 modes. For instance, if we take one to be extremely large wavelength so that it is effectively a constant background for the scales we are interested in, we can set up a scenario on top of this that will operate in exactly the same manner as above. We need not worry about any nonlinearities in the spin 2 field equations because we are free to take the frequency of the long mode to be arbitrarily small, and compensate this suppression by increasing the amplitude of the shorter wavelength mode.
Finally, we briefly comment on the generalization of 6.6 to the case of multiple species, as we needed for our analysis in section 5.2. In this case B µν will be a matrix, and the determinant of equation 6.1 will yield roots corresponding to all dispersion relations of the theory. If B is treated as a perturbation this can be expanded to first order, and the condition for superluminality is simply TrB µν k µ k ν ≥ 0 for null k µ . Should the scenario arise where the trace of this matrix vanishes, as in the Chern-Simons case discussed above, then the condition is Tr(B 00 B µν )k µ k ν ≥ 0.

General Relativity
It is worth asking how general relativity manages to always satisfy this constraint, since gravity modifies the propagation of light rays in exactly this way, yet retains a notion of causality. We consider two scenarios, that when the metric is set by a weak field source, and the case of a gravitational wave.
If the Einstein tensor is sourced by matter, then, following [25], the induced nondynamical graviton profile is where we have expanded around flat space in Lorentz gauge, and used the retarded solution to the wave equation that has support only within the past light cone. Then B µν = h µν and from (6.5) the speed is given by where T min is the minimum eigenvalue of the spatial part of the stress tensor and we introduced the bracket notation to denote convolution with the retarded Green's function G ret . Keeping in mind that the Green's function is positive semidefinite, the condition for this to be subluminal is T 00 + T min ≥ 0, which is the null energy condition. Using the additional stipulation that the energy is positive, this shows that our criterion reproduces the standard result [26,27] that the weak energy condition must hold to avoid superluminal motion.
Let us draw to the reader's attention that this condition is a requirement for causality in the purely classical theory. This will not be satisfied when treated as a quantum field theory, in generic quantum states. However, this does not immediately imply a violation of causality, as any amount of negative energy must necessarily be localized to a small region of spacetime [28]. The requirement for causality is then replaced with the averaged weak energy condition [29], in which the contribution from every point on the particle's path is integrated over. While this stricter criterion is technically the requirement for acausality, it is of no interest in the nonminimally coupled cases, where causality is violated even in the classical theory, not to mention the quantum version.
The case for a gravitational wave is equally instructive. Here we are free to take the transverse traceless gauge 3 , in which the dispersion relation for a test particle becomes The speeds can be read off as v 2 = {1, 1± h 2 + + h 2 × }. From this it appears that one of the directions does in fact become superluminal. However, one should be careful in this case, because there are number of subtleties. First of all, because of the equivalence principle, the term superluminal is a misnomer, as now all null curves will be affected in exactly the same manner. Secondly, in diffeomorphism invariant theories we need to make sure this does not just represent a rescaling of the coordinates, either globally or locally. However, the presence of a graviton is not something that can be gauged away with a coordinate transformation, and so represents a physical effect. What is nontrivial is the fact that even though passing through a graviton induces superluminality, it cannot lead to a time machine in general relativity because gravity is necessarily self interacting due to the equivalence principle. Then, if one tries to engineer a setup where a photon is passed through a boosted graviton and sent back through another boosted graviton to send future-directed signals back in time, one finds that the presence of the first graviton automatically induces an equal shift in the location of the second to completely cancel the net effect, as found in [14,31]. Thus, causality is preserved in general relativity. On the other hand, acausality can be present in the much larger class of derivative theories examined earlier, because there the spin 2 field does not interact with itself (at the classical level), and so two waves can pass by each other by superposition.

Spin 1
We briefly consider nonminimal couplings of a spin 1 field to matter to highlight some of the differences between this and the spin 2 case. If we take the coupling to scalars, the couplings with the lowest number of derivatives are These will not alter the causal structure of the scalar, but will affect the spin 1 field. The first 'dilaton type' coupling only serves to multiply the eikonal equation by (1 + f (φ)). As long as this is positive, the theory is healthy. The second 'axion type' coupling is a bit more subtle, since on a time dependent φ background the dispersion relation for the spin 1 field becomes [32] (1 + f ) ω 2 − k 2 = ±ġk (7.5) This term violates parity, but, in contrast to the gravitational Chern-Simons case we discussed in section 5.2, the effects only occur at low energies as opposed to high energies. Thus, one of the modes will become a tachyon for k <ġ, and the other will appear superluminal. But this apparent superluminality is not real, as the high momentum limit recovers the Lorentz invariant dispersion relation, guaranteeing that signals propagate on the light cone.
An analog for the Riemann coupling to fermions can be found in the case of electromagnetism.
We are free to write down a term of the form The first L int,1 are dipole moment interactions and the second L int,2 are anapole moment interactions as discussed in [33][34][35][36]. As far as its derivative structure is concerned, the first is a kind of complicated mass term for the fermion in the presence of a background electromagnetic field, and hence is causal.
The second will alter the kinetic term to leading order. We have written down a general combination of the field strength and its dual. The limit e 3 = 0 reproduces the term responsible for the Velo-Zwanziger pathology [37]. For a massive charged spin 3/2 minimally coupled to electromagnetism, it is well known that in the presence of a background magnetic field, the longitudinal mode exhibits superluminal propagation [38][39][40]. If one introduces the Stueckelberg mode and takes the decoupling limit of the Lagrangian, a term exactly of the form of (7.7) is present [41]. There it was shown to be removable with a field redefinition, at the expense of introducing fermion self couplings. So though it exhibits superluminality, this does not represent a real interaction with the spin 1 particle.

Massive Spin 2
In previous sections, our analysis focused on the case of massless spin 2 particles, but it is not hard to generalize our results to the case of massive particles. Generically, these theories were shown to exhibit faster than light propagation [42,43], and this behavior can be expected to hold in the case of nontrivial coupling as well. We will show that this expectation is indeed borne out.
Generically, a mass term can be expected to break the symmetry that eliminates the lower spin components from the kinetic term. These will acquire dynamics, as is most easily seen by employing the Stueckelberg trick of restoring the gauge symmetry by explicitly adding fields representing the longitudinal modes into the Lagrangian description. Since the purely longitudinal mode contains two derivatives, a generic mass term for the spin two field will lead to equations of motion that are fourth order in derivatives. The only combination capable of evading this is the Fierz-Pauli mass, where the potentially dangerous terms assemble into a total derivative. Let us now consider coupling this massive field to matter. We can choose any of the interactions discussed earlier which make use of the linearized Riemann tensor. These all lead to superluminality in the same way for the massless case; so we will not repeat that analysis here.
Moreoever, there are new types of interactions allowed for massive spin 2 that make use of the field h µν itself. We focus on coupling to scalar matter for simplicity. The simplest possible nonminimal coupling is This resembles a part of quasi-dilaton massive gravity [44], where it was showed that this coupling is ghost free. Isolating the longitudinal mode by performing the usual Stueckelberg replacement we find that the longitudinal mode now propagates according to This is of the form (6.1), and so gives rise to superluminal motion on some backgrounds.
The most familiar possibility is to minimally coupling h µν to matter, which takes the form ∆L = −κ h µν T µν M /2 at leading order in the coupling κ. The need to couple to the matter energy momentum here arises from the need to avoid the 6th degree of freedom, which is a ghost. However, as is well known, this theory is only ghost free at O(κ) and requires an infinite tower of corrections to build a truly ghost free theory including self interactions; as laid out in Ref. [45]. However such theories involve non-trivial self interactions of the scalar mode of the graviton of the Galileon form ∝ 2π(∂π) 2 , which are again associated with superluminality.

Conclusions
In summary, we find that all the leading order couplings of spin 2 particles to matter are acausal, except in the special case of general relativity. This appears then to be the universal principle that selects general relativity as the correct theory of spin 2 at low energies, and in particular selects the number of species of spin 2 to be just one.
This extends the results of previous uniqueness proofs that relied on the leading interaction being that of a minimally coupled field, strengthening this to more general momentum dependence of the interaction. Since causality is usually regarded as necessary for a theory to avoid pathologies and is also closely related to unitarity in a quantum field theory, no additional input is needed to derive general relativity's primacy among the possible space of spin 2 theories.
Though the style of our analysis was to analyze example theories one by one and explicitly show that they all contain this pathology, the subtleties we encountered in this exercise allowed us to formalize the conditions for any theory to be superluminal: as long as the dispersion relations reduce to a quadratic function of the four-momentum and there is some background for which the corrections to the dispersion relation can have either sign, the theory will be superluminal.
In principle there could be causal sub-leading interactions at sufficiently high mass dimension which violate the conditions for our no-go theorem. An example is to take some of the operators mentioned here and squaring them, such as ∼ (RF 2 ) 2 or ∼ (R I R J ) 2 , and assigning positive coefficients in the Lagrangian (though this tends to introduces ghosts). But our focus here has primarily been on the most leading order interactions, which ordinarily dominate at low energies.
We note that in the case of general relativity, plus non-minimal corrections, the situation with regards to causality is fundamentally different. In this case, the leading effects at low energies is provided by the usual general relativistic time delay for any ordinary source (Shapiro time delay).
Then the non-minimal coupling adds a secondary correction that can involve time advance, which is irrelevant at low energies and is more relevant at higher energies. Then the scale at which these two effects balance each other defines a cutoff on the effective field theory, because above this scale there would be acausality; this demands new physics at or below this scale.
However, in this much larger class of theories we are studying with multiple species of spin 2, which prevents the inclusion of minimal coupling (or a single species of spin 2 with G N = 0), these non-minimal effects are the leading effects. So we immediately have acausality even at arbitrarily low energies. This cannot be fixed by introducing new physics at some finite energy scale. Hence such theories cannot be UV completed in any conventional Wilsonian sense; it strongly suggests a fundamentally pathological theory.
It is striking that of the entire possible space of spin 2 theories studied, only a single one appears internally consistent, and represents the physics that we observe in our world. This, along with the prior results that causality can restrict derivative self interactions [46,47], establishes that causality can be an extremely powerful constraint.
Also striking is the stark contrast between these considerations for spin 1 theories: generic nonminimal couplings between spin 1 fields and matter are healthy. This makes the spin 2 case very special, and leaves general relativity as especially unique.
where S µν is a general function of the matter fields, and we have integrated the spin 2 kinetic term by parts. For example, in the scalar Lagrangian (3.1) we consider first, we would write S µν ≡ − a 1 f (φ) + a 2 ∂φ 2 η µν + a 3 ∂ µ φ∂ ν φ (9.2) To track how the Einstein tensor changes under field redefinition, we use the expression This form makes it manifest that this tensor is conserved, and reproduces the kinetic term (2.3), up to boundary terms. The Lagrangian can be rewritten as Now, if we redefine the spin 2 field as we affect a decoupling of the two sectors. Then, the Lagrangian is recast as Hence, instead of there being an interaction between the spin 2 field and matter, which could potentially be causal, there is in fact only self interactions in the matter sector. Some self interactions of this nature have been studied in previous works [46], and these results have recently been systematically extended in [47]. This holds completely generally, even with multiple species of matter or spin 2 fields, and allows us to focus our attention on couplings to the Riemann tensor, which represent true interactions that cannot be removed in this way.
Let us contrast this procedure to the one done ubiquitously in gravitational theories: because the kinetic term is quadratic and the interaction term is linear, this decoupling is exact. In gravitational theories this is not the case because the theory is nonlinear, and the field redefinition we have given only serves to stave off interactions to higher orders in the fields.